neuml / paperetl

📄 ⚙️ ETL processes for medical and scientific papers
Apache License 2.0
342 stars 27 forks source link

Remove legacy merge logic #36

Closed davidmezzetti closed 2 years ago

davidmezzetti commented 2 years ago

34 removed all study design and attribute detection in paperetl in favor of paperai. paperetl is now significantly faster without spaCy pipelines slowing things down. With that, the legacy merge process designed to overcome performance concerns can be removed. This can be replaced with a simple duplicate detect and replace on entry date.