janosh / matbench-discovery

An evaluation framework for machine learning models simulating high-throughput materials discovery.
https://matbench-discovery.materialsproject.org
MIT License
90 stars 12 forks source link

Fetching data fails #12

Closed pbenner closed 1 year ago

pbenner commented 1 year ago

Fetching the data currently fails:

Traceback (most recent call last):
  File "/home/pbenner/Source/tmp/matbench-discovery/data/wbm/fetch_process_wbm_dataset.py", line 184, in <module>
    urllib.request.urlretrieve(f"{mat_cloud_url}&{filename=}", file_path)
  File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/urllib/request.py", line 241, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
  File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/urllib/request.py", line 525, in open
    response = meth(req, response)
  File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/urllib/request.py", line 634, in http_response
    response = self.parent.error(
  File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/urllib/request.py", line 563, in error
    return self._call_chain(*args)
  File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/urllib/request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: NOT FOUND

https://github.com/janosh/matbench-discovery/blob/69dc75624e2c00e47f9b60f964c18d103adc1f2c/data/wbm/fetch_process_wbm_dataset.py#L184

janosh commented 1 year ago

Ah, just some extra quotes since {filename=} expands to filename='filename' which should actually be filename=filename. Easy fix.

janosh commented 1 year ago

@pbenner Could you do a source install from main and let me know if it's working now? If so, I'll cut a new PyPI release.

pbenner commented 1 year ago

This particular problem seems to be solved. However, there are further issues:

n_too_stable = 502
n_too_unstable = 22
Traceback (most recent call last):
  File "/home/pbenner/Source/tmp/matbench-discovery/data/wbm/fetch_process_wbm_dataset.py", line 473, in <module>
    save_fig(fig, f"{img_path}.svelte")
  File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/site-packages/pymatviz/utils.py", line 308, in save_fig
    fig.write_html(path, **defaults)
  File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/site-packages/plotly/basedatatypes.py", line 3708, in write_html
    return pio.write_html(self, *args, **kwargs)
  File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/site-packages/plotly/io/_html.py", line 536, in write_html
    path.write_text(html_str)
  File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/pathlib.py", line 1154, in write_text
    with self.open(mode='w', encoding=encoding, errors=errors, newline=newline) as f:
  File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/pathlib.py", line 1119, in open
    return self._accessor.open(self, mode, buffering, encoding, errors,
FileNotFoundError: [Errno 2] No such file or directory: '/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/site-packages/site/src/figs/hist-wbm-e-form-per-atom.svelte'

After manually creating the [...]/src/figs directory, the next issue is the following:

Traceback (most recent call last):
  File "/home/pbenner/Source/tmp/matbench-discovery/data/wbm/fetch_process_wbm_dataset.py", line 538, in <module>
    with gzip.open(DATA_FILES.mp_patched_phase_diagram, "rb") as zip_file:
  File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/gzip.py", line 58, in open
    binary_file = GzipFile(filename, gz_mode, compresslevel)
  File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/gzip.py", line 174, in __init__
    fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/site-packages/data/mp/2023-02-07-ppd-mp.pkl.gz'

The [...]/data/mp directory exists, but the pkl file is missing.

janosh commented 1 year ago

Thanks. Let's track that in a new issue. I created #14 from your comment. I need to take a closer look at how best to manage all the data files the MBD analysis relies on in pip installations.