neuml / paperetl

📄 ⚙️ ETL processes for medical and scientific papers
Apache License 2.0
342 stars 27 forks source link

Add component to build entry-dates.csv #18

Closed davidmezzetti closed 3 years ago

davidmezzetti commented 3 years ago

Currently, for the CORD-19 dataset, entry-dates.csv is required to be manually downloaded using the following instructions:

# Download entry-dates.csv and place in <download path>
# https://www.kaggle.com/davidmezzetti/cord-19-article-entry-dates/output

entry-dates.csv should be able to be built outside of Kaggle, to allow automation/docker builds. The Kaggle entry-dates component should be updated to call this new component.