neuml / paperetl

📄 ⚙️ ETL processes for medical and scientific papers
Apache License 2.0
342 stars 27 forks source link

Remove study attribute and design models and all related dependencies #34

Closed davidmezzetti closed 2 years ago

davidmezzetti commented 2 years ago

Currently, paperetl has a couple statistical study design related models to detect common study design fields. This requires a large NLP pipeline backed by spaCy to run a series of NLP/grammar steps. While this initially was a good solution in mid 2020, there are now better ways to do this.

Furthermore, the NLP pipelines are slow and add significant processing overhead. Last but not least, paperetl can process both medical and technical/scientific papers, these fields are medical specific. This functionality is more appropriate for the paperai project and the NLP logic should reside within that project.