Open mattwthompson opened 3 years ago
Oh noooo.
We've had this happen before, and I thought we had impressed upon NIST TRC the importance of engaging users through a gradual community process about major changes.
@mrshirts Can you put us in touch to sort out what can be done here?
I'd be happy to - @mattwthompson or @SimonBoothroyd could you write up a sentence or two with the exact details for me to send to them so we can figure this out? In the meantime, I think we're mostly using a local copy, correct?
Sorry, I only have a surface-level knowledge here. Simon (or John, or somebody else who has used it before) would be better suited to provide direction.
@mrshirts The issue seems to be that our entry point where we access/download the ThermoML tarballs has been removed and changed. It used to be an individual .tgz
for each of the journals, for example: https://trc.nist.gov/ThermoML/JCED.tgz . Now there is a single tarball at a different URL (https://data.nist.gov/od/ds/mds2-2422/ThermoML.v2020-09-30.tgz, landing page: https://data.nist.gov/od/id/mds2-2422). I'm not sure if any of the data has changed (my assumption would be no), but looking at the landing page it looks like they added .json files so it's possible there were other changes.
Here's an example traceback of how it's failing:
Traceback (most recent call last):
File "/home/owenmadin/Documents/python/binary-mixture-publication/data-set-curation/curate_boron_phosphorus_silicon_data.py", line 614, in <module>
main()
File "/home/owenmadin/Documents/python/binary-mixture-publication/data-set-curation/curate_boron_phosphorus_silicon_data.py", line 609, in main
initial_data = prepare_initial_data()
File "/home/owenmadin/Documents/python/binary-mixture-publication/data-set-curation/curate_boron_phosphorus_silicon_data.py", line 57, in prepare_initial_data
initial_data = CurationWorkflow.apply(
File "/home/owenmadin/anaconda3/envs/binary-mixture-publication/lib/python3.9/site-packages/openff/evaluator/datasets/curation/workflow.py", line 112, in apply
data_frame = component_class.apply(
File "/home/owenmadin/anaconda3/envs/binary-mixture-publication/lib/python3.9/site-packages/openff/evaluator/datasets/curation/components/components.py", line 90, in apply
modified_data_frame = cls._apply(data_frame, schema, n_processes)
File "/home/owenmadin/anaconda3/envs/binary-mixture-publication/lib/python3.9/site-packages/openff/evaluator/datasets/curation/components/thermoml.py", line 124, in _apply
cls._download_data(schema)
File "/home/owenmadin/anaconda3/envs/binary-mixture-publication/lib/python3.9/site-packages/openff/evaluator/datasets/curation/components/thermoml.py", line 71, in _download_data
request.raise_for_status()
File "/home/owenmadin/anaconda3/envs/binary-mixture-publication/lib/python3.9/site-packages/requests/models.py", line 953, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://trc.nist.gov/ThermoML/JCED.tgz
So essentially I think we'd need to change the place where we're getting the tarballs from, but it will also probably break some data collation tools that expect a series of tarballs rather than just one.
@mattwthompson Let me know how I can help out with fixing this (I am probably the main user of this tool currently)
Is this still broken? I forget if this has been resolved on another platform
https://github.com/openforcefield/openff-evaluator/pull/402
Looks like it has been resolved.
Unfortunately this is broken again, this time on NIST's end. It looks like there's an issue with their tarball. I get this message trying to download with evaluator:
Traceback (most recent call last):
File "/home/owenmadin/Documents/python/binary-mixture-publication/data-set-curation/vapor_pressure_search.py", line 93, in <module>
main()
File "/home/owenmadin/Documents/python/binary-mixture-publication/data-set-curation/vapor_pressure_search.py", line 88, in main
initial_data = prepare_initial_data()
File "/home/owenmadin/Documents/python/binary-mixture-publication/data-set-curation/vapor_pressure_search.py", line 57, in prepare_initial_data
initial_data = CurationWorkflow.apply(
File "/home/owenmadin/anaconda3/envs/openff-force-fields/lib/python3.9/site-packages/openff/evaluator/datasets/curation/workflow.py", line 112, in apply
data_frame = component_class.apply(
File "/home/owenmadin/anaconda3/envs/openff-force-fields/lib/python3.9/site-packages/openff/evaluator/datasets/curation/components/components.py", line 90, in apply
modified_data_frame = cls._apply(data_frame, schema, n_processes)
File "/home/owenmadin/anaconda3/envs/openff-force-fields/lib/python3.9/site-packages/openff/evaluator/datasets/curation/components/thermoml.py", line 113, in _apply
cls._download_data(schema)
File "/home/owenmadin/anaconda3/envs/openff-force-fields/lib/python3.9/site-packages/openff/evaluator/datasets/curation/components/thermoml.py", line 60, in _download_data
request.raise_for_status()
File "/home/owenmadin/anaconda3/envs/openff-force-fields/lib/python3.9/site-packages/requests/models.py", line 960, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: for url: https://data.nist.gov/od/ds/mds2-2422/ThermoML.v2020-09-30.tgz
{
"requestURL" : "/od/ds/mds2-2422/ThermoML.v2020-09-30.tgz",
"method" : "GET",
"status" : 500,
"message" : "Unexpected Server Error"
}
Process finished with exit code 1
And if I try to download manually through their download manager I get the same thing:
Information about requested bundle/package is given below.
Following files are not included in the bundle because of errors:
https://data.nist.gov/od/ds/mds2-2422/ThermoML.v2020-09-30.tgz?requestId=5c75c307-4328-46dd-baf2-068675b89c47 There is an Error accessing this file, Server returned status with response code 500 and message:There is an error accessing this file/URL from server.
@mrshirts can you contact someone at NIST to figure out why this is happening?
So, this link seems to work for me now using a manual download - can you check if that works for you, and if it might be transient?
https://data.nist.gov/od/ds/mds2-2422/ThermoML.v2020-09-30.tgz
I'm still unable to download manually, on either RHEL or Ubuntu.
Are other NIST downloads down, or just this one?
It was working manually for me for a couple min, but now is not.
I tried to download something else from the NIST website and it also failed. Maybe their servers are just struggling today?
Yeah, sounds like an overall NIST problem.
It would be good to have a "load from local tarball" option in evaluator.datasets.curation.thermoML.ImportThermoMLData
in the case this happens in the future.
It would be good to have a "load from local tarball" option
Good idea, file an issue?
Email from Damien Riccardi at NIST:
"I added a few links to the web app and data.nist.gov page this morning, and, before reaching out here, I reviewed your issue linked below. It appeared as though Thermoml issues on the Open FF end were resolved until the data.nist.gov download link to the .tgz file broke (as of yesterday). An email has been sent to admins of data.nist.gov and I hope it will be fixed soon. In clicking through the related openff thermoml issues I noticed the annoyance with historical movement in the data resource. The https://data.nist.gov/od/ds/mds2-2422/ThermoML.v2020-09-30.tgz file should now (technical difficulties with data.nist.gov servers aside) never change or be deleted."
Also, the ThermoML had a software note in JCC: https://onlinelibrary.wiley.com/share/author/WKPMRWMYRCFW79RXEQPW?target=10.1002/jcc.26842
@ocmadin posted this in Slack; I don't have time to look at it now but this might be a path forward:
https://onlinelibrary.wiley.com/doi/epdf/10.1002/jcc.26842 [...] TL;DR, don't think we need to change anything, but they are now offering a RESTful API to access the data which may be useful in the future.
Just to note this is masking a larger issue whereby the way ThermoML serves files has now changed and probably needs seem deeper inspection / fixes.
Originally posted by @SimonBoothroyd in https://github.com/openforcefield/openff-evaluator/issues/394#issuecomment-959815971