neuml / paperetl

📄 ⚙️ ETL processes for medical and scientific papers
Apache License 2.0
352 stars 27 forks source link

Filter duplicate ids #14

Closed davidmezzetti closed 4 years ago

davidmezzetti commented 4 years ago

Currently there is no duplicate detection within a single run during file processing. Add this capability, similar to what is in the CORD-19 process.