Closed ml-evs closed 1 year ago
Hi @ml-evs, thanks for creating this issue. We have discussed internally but haven't reached conclusions yet.
The most important metadata we need to add, I think, is our database version. Currently, our OPTIMADE endpoint is not giving information from our latest database version, which can cause a lot of confusion. In future, they should stay in sync, but I think this is still critical to communicate.
In terms of other properties, it's difficult to know where to start/stop. Hull values are probably among the most useful, but there are many others as well. I think we were hedging and waiting for official prefixes before going to a lot of effort to add these keys and then have to change them later.
Hi @ml-evs, thanks for creating this issue. We have discussed internally but haven't reached conclusions yet.
The most important metadata we need to add, I think, is our database version. Currently, our OPTIMADE endpoint is not giving information from our latest database version, which can cause a lot of confusion. In future, they should stay in sync, but I think this is still critical to communicate.
In terms of other properties, it's difficult to know where to start/stop. Hull values are probably among the most useful, but there are many others as well. I think we were hedging and waiting for official prefixes before going to a lot of effort to add these keys and then have to change them later.
Fair enough! Unfortunately the "official" prefixes will only really happen if there is demand/support from the databases that would implement them (maybe at the next workshop).
I think a bespoke _mp_hull_distance
would still be super useful, provided it is not too much effort on top of your existing implementation (it should only require config changes once the data is in the corresponding document itself).
Yes, definitely, and I agree it's not a big ask :) I think the hull value is a sensible addition. We can probably add that regardless, but I would just like to establish a more longer-term strategy too rather than adding things on an ad-hoc basis.
Perhaps we can establish some conventions and pitch them at the next workshop.
Yes, definitely, and I agree it's not a big ask :) I think the hull value is a sensible addition. We can probably add that regardless, but I would just like to establish a more longer-term strategy too rather than adding things on an ad-hoc basis.
Perhaps we can establish some conventions and pitch them at the next workshop.
Awesome! Happy to help out with the use of the optimade package (just ping me on here). One ergonomics thing I am adding to the next python-tools release is the ability to add provider fields to the info endpoint using just the config file (https://github.com/Materials-Consortia/optimade-python-tools/pull/1096), which might be useful here.
Hi @mkhorton, just thought I'd ping this again as the workshop has now flown by!
We've made quite a few changes to optimade-python-tools that should make the custom field process much easier (see this bit of the docs - adding a custom field is now config-only, provided it already exists in your underlying database), and also some fixes for issues that effect the current MP-OPTIMADE implementation (one's caused by me not you!). For example, we finally have a workaround that lets us use the latest (slightly scuffed) version of FastAPI, so depending on optimade
won't be holding you back.
Whilst there wasn't really any discussion of a hull distance field at the workshop (unfortunately very few of the databases that would use it were able to fully attend this year), it still seems like an obvious and very useful addition to me (it is currently stopping us from using MP-OPTIMADE in a data-driven project for live/in situ auto-XRD). We did also merge the fabled property definitions PR so the mechanism is now in-place for providers to come together and align property definitions without needing consensus from the whole consortium.
Thanks for the update @ml-evs! Tagging @munrojm and @tschaume here to make sure they see it.
wasn't really any discussion of a hull distance field at the workshop [...] it is currently stopping us from using MP-OPTIMADE in a data-driven project for live/in situ auto-XRD
This is interesting to hear. Is there a standard field name people are coalescing on yet? Do you have an example energy above hull property definition we could adopt via optimade-python-tools
?
This is interesting to hear. Is there a standard field name people are coalescing on yet? Do you have an example energy above hull property definition we could adopt via
optimade-python-tools
?
Sorry for missing this at the time @mkhorton! I think the only examples we have so far are OQMD which uses _oqmd_stability
and odbx (i.e., me) which uses _odbx_thermodynamics->hull_distance
to provide the hull distance in eV/atom (and similarly _oqmd_delta_e.
and _odbx_thermodynamics->formation_energy
for formation energy per atom).
In terms of optimade-python-tools
, you can just add the following to your o-p-t
config:
"provider_fields": {
"structures": [
{
"name": "energy_above_hull",
"description": "The distance this structure lies above the convex hull of relevant phase diagram spanned by its constituent elements, with a given set of computational parameters.",
"unit": "eV/atom",
"type": "float"
}
]
}
This will add this field metadata to the /info/structures
endpoint of your API as _mp_energy_above_hull
. If you are serving your OPTIMADE API off the same MongoDB as your main API, and you can access the underlying value with a flat (i.e., no $lookup
or other aggregations) query on the document, then you can just add the alias:
"aliases": {
"structures": {
"energy_above_hull": "<database_field_name_for_hull_distance>",
}
}
If the database access is more complicated than that, then you'd either have to write a custom MongoTransformer
to do the appropriate plumbing, or inline the hull distance values in whatever collection you run the OPTIMADE API from, but based on the SummaryDoc
stuff in the new MP API I would have thought this should be straightforward enough. If not, we can discuss adding support for aggregations based on the alias config in o-p-t
(e.g., provide a field name and a collection name in which to do a $lookup
).
In terms of other providers agreeing on a definition and field name, I guess that providers have to be seen as actually wanting to serve this data through OPTIMADE first, and I really hope they will, as many applications hinge on it!
@mkhorton I'd be happy to take this on. No time pressure of course if there are higher priorities atm. But if @munrojm or @tschaume can give me some guidance on what would need to be done, I think it'd be great to expose MP stability data through OPTIMADE.
@janosh I agree it would be a nice addition. In terms of optimade, however, we should first bump optimade to the latest version (currently still running 0.13.3) which requires we work on adding optimade structures to our build stages first.
@janosh I would be happy to take as much time as needed to give guidance if you are willing to take this on. @tschaume is right, though. It needs me to first get the optimade structures incorporated into our automated builds. I will move it up on my list and try and get it done in the next week. Shouldn't be too heavy of a lift.
@tschaume @munrojm Thanks for the fast replies! Sounds great. Happy to take care of the merge conflicts on https://github.com/materialsproject/devops/pull/594 (if someone gives me push access).
Great to see progress on this and more than happy to help out from our side, let me know if you run into any issues!
Hi MP devs, would there be any interest in exposing (presumably PBE) formation energies and hull distances via the MP OPTIMADE API? This would involve adding custom
_mp_hull_distance
(or whatever) fields and listing them in the config of your OPTIMADE server (which I guess is exterior to this repo).OQMD currently provide this data with the
_oqmd_stability
field, which is very useful when using their OPTIMADE API as part of an experimental workflow, e.g., automated XRD refinement. Eventually, it would be great to get the big DFT databases to agree on a standard prefix for this kind of data so that cross-database queries for proposed stable materials can be performed.Cheers!