pd3f / pd3f-core

📑 Python Package to reconstruct the original continuous text from PDFs with language models
https://pd3f.github.io/pd3f-core/index.html
GNU Affero General Public License v3.0
33 stars 8 forks source link

page separator #42

Open ajtkulov opened 8 months ago

ajtkulov commented 8 months ago

Could you add a page separator symbol?

For instance, pdftotext-tool inserts the 12.toChar (0x0C) symbol. It's helpful for further analysis.