issues
search
marianna13
/
doc2dataset
A tool to extract text (and images) from documents (like PDFs)
MIT License
2
stars
1
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Jsonl.gz output
#8
marianna13
closed
5 months ago
0
Xhtml(code rafactor)
#7
marianna13
closed
6 months ago
0
Refactor code according to mupy and lint
#6
marianna13
closed
6 months ago
1
Support extraction from PDF URLs
#5
marianna13
closed
7 months ago
0
Test for more scales
#4
marianna13
closed
6 months ago
1
Extract SVG images
#3
marianna13
opened
8 months ago
1
Pyspark support
#2
marianna13
closed
7 months ago
0
W&B support
#1
marianna13
closed
6 months ago
1