openforcefield / openff-evaluator

A physical property evaluation toolkit from the Open Forcefield Consortium.
https://docs.openforcefield.org/projects/evaluator
MIT License
55 stars 18 forks source link

ThermoML interface likely broken #396

Open mattwthompson opened 3 years ago

mattwthompson commented 3 years ago

Just to note this is masking a larger issue whereby the way ThermoML serves files has now changed and probably needs seem deeper inspection / fixes.

Originally posted by @SimonBoothroyd in https://github.com/openforcefield/openff-evaluator/issues/394#issuecomment-959815971

jchodera commented 3 years ago

Oh noooo.

We've had this happen before, and I thought we had impressed upon NIST TRC the importance of engaging users through a gradual community process about major changes.

@mrshirts Can you put us in touch to sort out what can be done here?

mrshirts commented 3 years ago

I'd be happy to - @mattwthompson or @SimonBoothroyd could you write up a sentence or two with the exact details for me to send to them so we can figure this out? In the meantime, I think we're mostly using a local copy, correct?

mattwthompson commented 3 years ago

Sorry, I only have a surface-level knowledge here. Simon (or John, or somebody else who has used it before) would be better suited to provide direction.

ocmadin commented 3 years ago

@mrshirts The issue seems to be that our entry point where we access/download the ThermoML tarballs has been removed and changed. It used to be an individual .tgz for each of the journals, for example: https://trc.nist.gov/ThermoML/JCED.tgz . Now there is a single tarball at a different URL (https://data.nist.gov/od/ds/mds2-2422/ThermoML.v2020-09-30.tgz, landing page: https://data.nist.gov/od/id/mds2-2422). I'm not sure if any of the data has changed (my assumption would be no), but looking at the landing page it looks like they added .json files so it's possible there were other changes.

Here's an example traceback of how it's failing:

Traceback (most recent call last):
  File "/home/owenmadin/Documents/python/binary-mixture-publication/data-set-curation/curate_boron_phosphorus_silicon_data.py", line 614, in <module>
    main()
  File "/home/owenmadin/Documents/python/binary-mixture-publication/data-set-curation/curate_boron_phosphorus_silicon_data.py", line 609, in main
    initial_data = prepare_initial_data()
  File "/home/owenmadin/Documents/python/binary-mixture-publication/data-set-curation/curate_boron_phosphorus_silicon_data.py", line 57, in prepare_initial_data
    initial_data = CurationWorkflow.apply(
  File "/home/owenmadin/anaconda3/envs/binary-mixture-publication/lib/python3.9/site-packages/openff/evaluator/datasets/curation/workflow.py", line 112, in apply
    data_frame = component_class.apply(
  File "/home/owenmadin/anaconda3/envs/binary-mixture-publication/lib/python3.9/site-packages/openff/evaluator/datasets/curation/components/components.py", line 90, in apply
    modified_data_frame = cls._apply(data_frame, schema, n_processes)
  File "/home/owenmadin/anaconda3/envs/binary-mixture-publication/lib/python3.9/site-packages/openff/evaluator/datasets/curation/components/thermoml.py", line 124, in _apply
    cls._download_data(schema)
  File "/home/owenmadin/anaconda3/envs/binary-mixture-publication/lib/python3.9/site-packages/openff/evaluator/datasets/curation/components/thermoml.py", line 71, in _download_data
    request.raise_for_status()
  File "/home/owenmadin/anaconda3/envs/binary-mixture-publication/lib/python3.9/site-packages/requests/models.py", line 953, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://trc.nist.gov/ThermoML/JCED.tgz

So essentially I think we'd need to change the place where we're getting the tarballs from, but it will also probably break some data collation tools that expect a series of tarballs rather than just one.

ocmadin commented 3 years ago

@mattwthompson Let me know how I can help out with fixing this (I am probably the main user of this tool currently)

mattwthompson commented 2 years ago

Is this still broken? I forget if this has been resolved on another platform

ocmadin commented 2 years ago

https://github.com/openforcefield/openff-evaluator/pull/402

Looks like it has been resolved.

ocmadin commented 2 years ago

Unfortunately this is broken again, this time on NIST's end. It looks like there's an issue with their tarball. I get this message trying to download with evaluator:

Traceback (most recent call last):
  File "/home/owenmadin/Documents/python/binary-mixture-publication/data-set-curation/vapor_pressure_search.py", line 93, in <module>
    main()
  File "/home/owenmadin/Documents/python/binary-mixture-publication/data-set-curation/vapor_pressure_search.py", line 88, in main
    initial_data = prepare_initial_data()
  File "/home/owenmadin/Documents/python/binary-mixture-publication/data-set-curation/vapor_pressure_search.py", line 57, in prepare_initial_data
    initial_data = CurationWorkflow.apply(
  File "/home/owenmadin/anaconda3/envs/openff-force-fields/lib/python3.9/site-packages/openff/evaluator/datasets/curation/workflow.py", line 112, in apply
    data_frame = component_class.apply(
  File "/home/owenmadin/anaconda3/envs/openff-force-fields/lib/python3.9/site-packages/openff/evaluator/datasets/curation/components/components.py", line 90, in apply
    modified_data_frame = cls._apply(data_frame, schema, n_processes)
  File "/home/owenmadin/anaconda3/envs/openff-force-fields/lib/python3.9/site-packages/openff/evaluator/datasets/curation/components/thermoml.py", line 113, in _apply
    cls._download_data(schema)
  File "/home/owenmadin/anaconda3/envs/openff-force-fields/lib/python3.9/site-packages/openff/evaluator/datasets/curation/components/thermoml.py", line 60, in _download_data
    request.raise_for_status()
  File "/home/owenmadin/anaconda3/envs/openff-force-fields/lib/python3.9/site-packages/requests/models.py", line 960, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error:  for url: https://data.nist.gov/od/ds/mds2-2422/ThermoML.v2020-09-30.tgz
{
  "requestURL" : "/od/ds/mds2-2422/ThermoML.v2020-09-30.tgz",
  "method" : "GET",
  "status" : 500,
  "message" : "Unexpected Server Error"
}

Process finished with exit code 1

And if I try to download manually through their download manager I get the same thing:

Information about requested bundle/package is given below.

 Following files are not included in the bundle because of errors: 

 https://data.nist.gov/od/ds/mds2-2422/ThermoML.v2020-09-30.tgz?requestId=5c75c307-4328-46dd-baf2-068675b89c47 There is an Error accessing this file, Server returned status with response code  500 and message:There is an error accessing this file/URL from server.

@mrshirts can you contact someone at NIST to figure out why this is happening?

mrshirts commented 2 years ago

So, this link seems to work for me now using a manual download - can you check if that works for you, and if it might be transient?

https://data.nist.gov/od/ds/mds2-2422/ThermoML.v2020-09-30.tgz

ocmadin commented 2 years ago

I'm still unable to download manually, on either RHEL or Ubuntu.

mrshirts commented 2 years ago

Are other NIST downloads down, or just this one?

mrshirts commented 2 years ago

It was working manually for me for a couple min, but now is not.

ocmadin commented 2 years ago

I tried to download something else from the NIST website and it also failed. Maybe their servers are just struggling today?

mrshirts commented 2 years ago

Yeah, sounds like an overall NIST problem.

ocmadin commented 2 years ago

It would be good to have a "load from local tarball" option in evaluator.datasets.curation.thermoML.ImportThermoMLData in the case this happens in the future.

mrshirts commented 2 years ago

It would be good to have a "load from local tarball" option

Good idea, file an issue?

mrshirts commented 2 years ago

Email from Damien Riccardi at NIST:

"I added a few links to the web app and data.nist.gov page this morning, and, before reaching out here, I reviewed your issue linked below. It appeared as though Thermoml issues on the Open FF end were resolved until the data.nist.gov download link to the .tgz file broke (as of yesterday). An email has been sent to admins of data.nist.gov and I hope it will be fixed soon. In clicking through the related openff thermoml issues I noticed the annoyance with historical movement in the data resource. The https://data.nist.gov/od/ds/mds2-2422/ThermoML.v2020-09-30.tgz file should now (technical difficulties with data.nist.gov servers aside) never change or be deleted."

Also, the ThermoML had a software note in JCC: https://onlinelibrary.wiley.com/share/author/WKPMRWMYRCFW79RXEQPW?target=10.1002/jcc.26842

mattwthompson commented 2 years ago

@ocmadin posted this in Slack; I don't have time to look at it now but this might be a path forward:

https://onlinelibrary.wiley.com/doi/epdf/10.1002/jcc.26842 [...] TL;DR, don't think we need to change anything, but they are now offering a RESTful API to access the data which may be useful in the future.

GregorySchwing commented 11 months ago

working url https://nist-oar-cache.s3.amazonaws.com/prd/gen0/mds2-2422/ThermoML.v2020-09-30.tgz