neuml / paperetl

📄 ⚙️ ETL processes for medical and scientific papers
Apache License 2.0
342 stars 27 forks source link

what I missing? KeyError: '47235b96c07e8066195b6521882340408b9bdd34' #4

Closed SeekPoint closed 4 years ago

SeekPoint commented 4 years ago

ghSrc/paperetl % python -m paperetl.cord19 2020-08-12 ......... /usr/local/lib/python3.8/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator TfidfVectorizer from version 0.23.1 when using version 0.23.2. This might lead to breaking code or invalid results. Use at your own risk. warnings.warn( /usr/local/lib/python3.8/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version 0.23.1 when using version 0.23.2. This might lead to breaking code or invalid results. Use at your own risk. warnings.warn( /usr/local/lib/python3.8/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator RandomForestClassifier from version 0.23.1 when using version 0.23.2. This might lead to breaking code or invalid results. Use at your own risk. warnings.warn( Traceback (most recent call last): File "/usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/usr/local/lib/python3.8/site-packages/paperetl/cord19/main.py", line 11, in Execute.run(sys.argv[1], File "/usr/local/lib/python3.8/site-packages/paperetl/cord19/execute.py", line 281, in run article.metadata = article.metadata + (dates[sha],) KeyError: '47235b96c07e8066195b6521882340408b9bdd34' ghSrc/paperetl %

my directory:

paperetl/2020-08-12 % ll total 14642544 drwxr-xr-x 14 yuanke staff 448 8 14 10:06 . drwxr-xr-x 13 yuanke staff 416 8 14 09:39 .. drwxr-xr-x 3 yuanke staff 96 8 14 09:44 __results___files -rw-r--r--@ 1 yuanke staff 455816 8 5 15:01 attribute -rw-r--r--@ 1 yuanke staff 206732 8 5 15:01 attribute.csv -rw-r--r-- 1 yuanke staff 24504 8 13 05:52 changelog -rw-r--r-- 1 yuanke staff 1375487377 8 13 05:53 cord_19_embeddings.tar.gz -rw-r--r-- 1 yuanke staff 3143476778 8 13 05:23 cord_19_embeddings_2020-08-12.csv -rw-r--r--@ 1 yuanke staff 4185255 8 5 15:01 design -rw-r--r--@ 1 yuanke staff 61843 8 5 15:01 design.csv drwxr-xr-x 4 yuanke staff 128 8 14 10:04 document_parses -rw-r--r-- 1 yuanke staff 2638941522 8 13 05:53 document_parses.tar.gz -rw-r--r--@ 1 yuanke staff 15487674 8 14 01:44 entry-dates.csv -rw-r--r-- 1 yuanke staff 297398784 8 13 05:53 metadata.csv paperetl/2020-08-12 %

davidmezzetti commented 4 years ago

Can you download https://www.kaggle.com/davidmezzetti/cord-19-article-entry-dates/output?select=entry-dates.csv again?

I had to refresh that file to have data through 2020-08-12.