Add docs on compute.json

openforcefield / qca-dataset-submission

Data generation and submission scripts for the QCArchive ecosystem.

Other

32 stars 6 forks source link

Add docs on compute.json #387

Closed lilyminium closed 1 month ago

lilyminium commented 1 month ago

There's very little existing documentation on compute.json and how to use it (a search of the readme and issues didn't get me any results). As a result it's not clear how to use it and what advantages it brings over making a new dataset.

Pros I can think of:

makes clear which datasets are linked
probably is faster and takes less time than re-generating/re-downloading a dataset?
Avoids any issues of re-generating conformers

This is an issue to collect any notes we find while running through adding compute specs, please add any comments or notes to self on things you run into!

lilyminium commented 1 month ago

I think programmatic process would look something like pulling/reconstituting the original dataset, removing any existing specs, adding a new one, and saving it to JSON.

amcisaac commented 1 month ago

My understanding is that the compute.json shouldn't have any molecule info in it, so I sped up the process by not re-downloading the dataset, and instead re-creating it with the same name. I don't know if that's correct though, but seemed to give the same result as other examples of compute.json

lilyminium commented 1 month ago

One thing we just ran into -- the default compute tag might be set to "openff", so that should be manually modified in the submission, or the Error Cycling workflow should be manually run, if you want a different tag.

lilyminium commented 1 month ago

Fixed by #389