pjheslin / diogenes

Diogenes: an environment for reading Latin and Greek
https://d.iogen.es/d
Other
56 stars 10 forks source link

Hyphenations lost in conversion to XML #94

Open uti5 opened 1 year ago

uti5 commented 1 year ago

The program for converting the old databases to XML doesn't preserve end-of-line hyphenations.

pjheslin commented 1 year ago

Yes, that is by design. Diogenes goes to a great deal of trouble to remove the hyphenations. Hyphenation is usually considered an extremely undesirable feature which reflects nothing but the incidental pagination of a text in a given edition and which interferes with most of the purposes to which you would want to put an XML text. This, and the other principles of the XML exported by Diogenes, are explained on the website.

Is there a particular reason you want to preserve the hyphenation?

uti5 commented 1 year ago

I noticed this when trying to typeset an interlinear translation of Aristotle with an even right margin, which really does require the hyphenations to be the way they were in Bekker (and in the Oxford editions, which preserve Bekker's hyphenations). A more common reason to want to preserve them is probably that without them references cease to be accurate: one may wish to refer to a sentence that ends on line 11 but now appears to end on line 10.

pjheslin commented 1 year ago

Apologies for the slow response. I didn't know that the Oxford texts of Aristotle tend to reproduce the line-breaks of Bekker's text. In most prose texts, the line numbers are not used for referencing. The difference with Bekker numbering is that the line numbers are routinely used (unlike e.g. Stephanus pages for Plato). That makes Aristotle's text more like poetry from the standpoint of referencing than prose.

I could just change it so that Aristotle is treated like verse. But I think your use-case is unusual and most users will not want a hyphenated text. So I don't want to change the default treatment of Aristotle. I think I need to find a way to make this user-configurable. Probably a command-line switch to force treating texts as prose or verse. I wonder if there are other classical prose texts that are routinely cited by line number?

Until I get around to doing this, if you want to treat Aristotle like verse and keep the line-breaks, you can force it by adding this line to xml-export.pl, after line 2436:

        'tlg:0086' => 1,