kitodo / kitodo-presentation

Kitodo.Presentation is a feature-rich framework for building a METS- or IIIF-based digital library. It is part of the Kitodo Digital Library Suite.
https://kitodo.org
GNU General Public License v3.0
38 stars 44 forks source link

Fix slub/dfgviewer#147 in ALTO parser #455

Open sebastian-meyer opened 4 years ago

sebastian-meyer commented 4 years ago

We have fixed https://github.com/slub/dfg-viewer/issues/147 rather quick & dirty. A better solution would involve fixing the issue directly in the ALTO parser of Kitodo.Presentation.

bertsky commented 3 years ago

Your fix now allows to render text that has (HTML-encoded) newlines in it as well, but no SP (or not even multiple distinct TextLine elements). See here for an example. (This ALTO was produced by page-to-alto converter with --alto-version 2.0 --dummy-textline --dummy-word in effect.)

It would be great if that workaround would still work in the future (because full texts without true/correct textline and word segmentation are a valid use case).

But it also shows that it is important for readibility that at least some newlines appear / get rendered. In my example, newlines are already included in the string. But Presentation should also insert them between successive TextLines.

bertsky commented 1 year ago

BTW, the ALTO download then removes the HTML-encoded newline characters – too bad!