urieli / jochre

Java Optical CHaracter Recognition
GNU Affero General Public License v3.0
22 stars 11 forks source link

Need option to not break lines #119

Open markhdavid opened 5 months ago

markhdavid commented 5 months ago

Jochre has a feature of sort of "unword-wrapping" text. The feature helps assure that the resulting text, when subject to word wrapping by a typical modern word processor, will appear properly formatted, with paragraph boundaries where they should be and linebreaks based on line width limits where they should appear.

While this is an impressive and welcome feature in many cases, it can be undesirable in certain cases:

(1) this makes comparison with the original more difficult. When humans put OCR output into standard word processors, the loss of the linebreaks based on line width means that you cannot easily visually compare lines in the source image with lines in the output.

(2) In poems each line should normally be preserved.

Can there be an option to preserve all line breaks? This is a feature request.

markhdavid commented 5 months ago

This may be a duplicate of https://github.com/urieli/jochre/issues/100 - take your pick.