scambier / obsidian-text-extractor

A (companion) plugin to facilitate the extraction of text from images (OCR) and PDFs.
GNU General Public License v3.0
349 stars 19 forks source link

[Feature request] Fix carriage returns #40

Open CaptainKludge opened 1 year ago

CaptainKludge commented 1 year ago

Tesseract does not output carriage returns. After looking at an output file in a hex editor the reason is clear. Tesseract seems to determine line feeds prefectly fine but it only inserts the Line Feed character (0x0A) and not the carriage return character that a windows text file expects. (0x0D 0x0A)

So a better behavior would be to take 0x0D in an output string and replace the hex found with 0x0D 0x0A. Definitely increate usability.