jzillmann / pdf-to-markdown

A PDF to Markdown converter
https://pdf2md.morethan.io
MIT License
1.14k stars 184 forks source link

Broken paragraphs (enhancement) #58

Open nerun opened 1 year ago

nerun commented 1 year ago

The paragraphs are broken into several short lines, but I know this is not a problem as the markdown will always consider two continuous lines as a single paragraph. BUT, in case you want to add some functionality to create very long lines (aka paragraphs), I've written this shell script. Perhaps you could translate it to javascript if it looks interesting. It works well, although it's not perfect.

GitHub Gist: paragrapher

The purpose of this script is to analyze plain text files (with or without the ".txt" extension) looking for broken paragraphs, i.e., paragraphs splited in more than one line, and join them in a single very long line.