Closed pbenner closed 1 year ago
My bad for not handling pickle files separately in load_train_test()
. Maybe we should rename the function into load()
since it's clearly morphing into more than just a training and test set loader. @pbenner Curious to hear your opinion!
Yes sounds good! Also fetch_process_wbm_dataset.py could be fully integrated and called when first running load()
This is the error I get using the new branch:
Traceback (most recent call last):
File "/home/pbenner/Source/tmp/matbench-discovery/data/wbm/fetch_process_wbm_dataset.py", line 322, in <module>
assert sum(no_id_mask := df_summary.index.isna()) == 6, f"{sum(no_id_mask)=}"
AssertionError: sum(no_id_mask)=0
Are you using pandas
v1.x.x? I just changed the code from v1 to v2 compat. I'll downwards pin pandas
in pyproject.toml
to avoid this in the future.
Indeed, I had pandas 1.5, trying to check with pandas 2.0. Meanwhile, I think 2023-02-07-mp-elemental-reference-entries.json.gz was modified:
python data/wbm/fetch_process_wbm_dataset.py Loading 'wbm_summary' from cached file at '/home/pbenner/.cache/matbench-discovery/1.0.0/wbm/2022-10-19-wbm-summary.csv' Warning: '/home/pbenner/.cache/matbench-discovery/1.0.0/mp/2023-02-07-mp-elemental-reference-entries.json.gz' associated with key='mp_elemental_ref_entries' does not exist. Would you like to download it now using matbench_discovery.data.load_train_test('mp_elemental_ref_entries'). This will cache the file for future use. [y/n] y Downloading 'mp_elemental_ref_entries' from https://figshare.com/ndownloader/files/40344445
variable dump:
file='mp/2023-02-07-mp-elemental-reference-entries.json.gz',
url='https://figshare.com/ndownloader/files/40344445',
reader=<function read_json at 0x7f9898a875b0>,
kwargs={'compression': 'gzip'}
Traceback (most recent call last):
File "/home/pbenner/Source/tmp/matbench-discovery/data/wbm/fetch_process_wbm_dataset.py", line 24, in
Yeah, I was in the process of updating the Figshare files but then got carried away. That error will be fixed before I merge #26.
fetch_process_wbm_dataset.py now hangs here:
However, manual download seems to work.