marianna13 / doc2dataset

A tool to extract text (and images) from documents (like PDFs)
MIT License
2 stars 1 forks source link

Test for more scales #4

Closed marianna13 closed 6 months ago

marianna13 commented 8 months ago

Some tests: https://huggingface.co/datasets/marianna13/PDF_extraction_sample