marianna13 / doc2dataset

A tool to extract text (and images) from documents (like PDFs)
MIT License
3 stars 1 forks source link

Jsonl.gz output #8

Closed marianna13 closed 8 months ago