Closed DavidRivasPhD closed 4 years ago
Hi David,
Unfortunately, NLTK has different models that are installed with different installations. Thank you for sharing that you needed to install an additional model.
Regarding study design, not all documents will have a detected study design (will be 0). Only certain documents will be tagged as COVID-19.
Please take a look at: https://www.kaggle.com/davidmezzetti/cord-19-etl and https://www.kaggle.com/davidmezzetti/cord-19-analysis-with-sentence-embeddings
These notebooks on Kaggle have all the background on how this process works.
Thank you David. After examining the results more carefully, unlike the above screenshots,
I found plenty on non-NULL values (screenshots below); so the output is correct and there is no such a bug.
Yes, I have read your writings on GitHub, Kaggle, and tds as well as Ankur Mohan’s blogs about your work. Your models are very nice, I have learned a lot from them.
Hi David, In addition to following your paperetl installation instructions I had to take these steps to get rid of the following error and warning:
this created a directory ~/nltk_data/tokenizers/punkt and fixed the above error
Also, the UserWarning below was eliminated as follows:
Then, unfortunately after a full run the resulting articles.sqlite database came out with the Study Design fields, and Tags and Labels being NULL (see screenshots below).
Any ideas on how solve this NULL issue would be appreciated