manisandro / gImageReader

A Gtk/Qt front-end to tesseract-ocr.
GNU General Public License v3.0
1.57k stars 187 forks source link

automatically make an enter paragraph #662

Open cutegitcat opened 7 months ago

cutegitcat commented 7 months ago

Hello everybody I find this software tool a good tool to convert text images into simple text format. I have a request: If I have e.g. 100 (short) text images files and would convert all of them to a text format, all of them are put together as text without an enter paragraph. It should be technically feasible to automatically make an enter paragraph. Explained in more detail with the example: with 100 (short) text images makes so far infinite text on a text format without enter paragraph. Should be solved, with 100 (short) text images makes text with 100 times enter paragraph. Solved text would be: text, enter-paragraph, text, enter-paragraph, text, enter-paragraph, text, enter-paragraph, text, enter-paragraph, text... Thank you very much in advance, only if this is feasible. :-)

manisandro commented 5 months ago

Can you elaborate? When recognizing in plain text mode, the text from each recognized image is separated with a line break. If you recognize to hOCR, the output is split into separate pages. Not sure where you get one continuous text block with the text of all recognized images?

cutegitcat commented 5 months ago

Hello manisandro !

Thanks for the message. From practical experience it has been time to convert an image file with the text it has in text form (with OCR) the lines in order - it is fine. Only I mean when many files it is not in order. There is a missing line break between the files of the image file.

The following is a simplified explanation based on an example:

Image-File 1: The melting Arctic is a crime scene.

Image-File 2: J7 is the anonymous perpetrator leaving evidence and clues for me to discover,

Image-File 3: like breadcrumbs leading back to him. James, he had said,

Image File 4: the day we first met at the research institute,

Image File 5: "If you are going to make it up here, don’t lock your doors."

Image File 6: It seemed like a life philosophy, rather than a survival tip.

This was converted without a line break and looks like this:

The melting Arctic is a crime scene. J7 is the anonymous perpetrator leaving evidence and clues for me to discover, like breadcrumbs leading back to him. James, he had said, the day we first met at the research institute, “If you are going to make it up here, don’t lock your doors.” It seemed like a life philosophy, rather than a survival tip.

Actually, it should look like this with a line break (this mean automatically make an enter paragraph):

The melting Arctic is a crime scene.

J7 is the anonymous perpetrator leaving evidence and clues for me to discover,

like breadcrumbs leading back to him. James, he had said,

the day we first met at the research institute,

“If you are going to make it up here, don’t lock your doors.”

It seemed like a life philosophy, rather than a survival tip.