openlibhums / pandoc_plugin

Plugin for janeway for automatic galley generation
GNU Affero General Public License v3.0
4 stars 1 forks source link

Newline after <sup> and <em> elements in generated html creates formatting error on display #17

Closed hachacha closed 7 months ago

hachacha commented 4 years ago

Currently running 44d1fdaa75196e4a9472699bae5bf5b933e16c57

This is it after rendered fully, note the spaces and weird underlines

image

When looking at the html file it appears like this for both <sup> and <em> elements:

image

After removing the new lines from the html galley file directly, the page renders properly with no weird spacing around the sup or emphasis elements.

I tried to debug this to figure out where it's coming from. I checked the output from pandoc because that would make sense since it's these elements specifically giving issue, but don't see anything in stdout:

From Isidore\xe2\x80\x99s <em>Etymologies</em> (12<sup>th</sup> century), original located at the British Library (shelfmark: Royal 12 F. IV, f.135v).</p>\n<p>The idea of the impossibility of life in the southern hemisphere (virtually lying on the back side of traditional T-O maps)<a href="#fn4" class="footnoteRef" id="fnref4"><sup>4</sup></a> derived from the theory of climate zones, first described by Macrobius in the 5<sup>th</sup>

I tried various ways of printing out from beautiful soup but that also seems about the same.

so I thought it may be from trying to write to the file but changing the newline attribute and replacing these things in the string in views.py still does not help

with open(output_path, mode="w", encoding="utf-8", newline='') as html_file:
        print(html.replace("\r","").replace('\n',''), file=html_file)

any help or guidance with this would be greatly appreciated.

hachacha commented 3 years ago

Hi, @ajrbyers @mauromsl
Wanted to follow up on this. Has this been an issue for any other journals using this plugin? I have tried the latest version, and updated pandoc to that latest version as well but am still experiencing this extra space on <sup> elements when converting to html. Thanks.

mauromsl commented 3 years ago

Hi @hachacha, can you share the original .docx with us on discord to take a look? We haven't come across this problem on any journal yet

hachacha commented 3 years ago

I tried just converting the doc on the command line on the same server and it output okay. I think the issue may be with beautiful soup somehow... pandoc -s ORIGINAL.docx -t html -o foo.html image

i'll send the file over on discord thanks!

mauromsl commented 7 months ago

closed by birkbeckctp/janeway#4044