issues
search
neuml
/
paperetl
📄 ⚙️ ETL processes for medical and scientific papers
Apache License 2.0
342
stars
27
forks
source link
PDF extraction improvements
#1
Closed
davidmezzetti
closed
4 years ago
davidmezzetti
commented
4 years ago
Modify the PDF file extraction process as follows:
Use dateutil to parse and format the published/date field
Extract content from tables
Build uid off the title but fallback to doi if title is not found
Add tag of PDF to articles
Modify the PDF file extraction process as follows: