materialsproject / emmet

Be a master builder of databases of material properties. Avoid the Kragle.
https://materialsproject.github.io/emmet/
Other
55 stars 68 forks source link

Exposing formation energies and stability data through the OPTIMADE API #375

Closed ml-evs closed 1 year ago

ml-evs commented 2 years ago

Hi MP devs, would there be any interest in exposing (presumably PBE) formation energies and hull distances via the MP OPTIMADE API? This would involve adding custom _mp_hull_distance (or whatever) fields and listing them in the config of your OPTIMADE server (which I guess is exterior to this repo).

OQMD currently provide this data with the _oqmd_stability field, which is very useful when using their OPTIMADE API as part of an experimental workflow, e.g., automated XRD refinement. Eventually, it would be great to get the big DFT databases to agree on a standard prefix for this kind of data so that cross-database queries for proposed stable materials can be performed.

Cheers!

mkhorton commented 2 years ago

Hi @ml-evs, thanks for creating this issue. We have discussed internally but haven't reached conclusions yet.

The most important metadata we need to add, I think, is our database version. Currently, our OPTIMADE endpoint is not giving information from our latest database version, which can cause a lot of confusion. In future, they should stay in sync, but I think this is still critical to communicate.

In terms of other properties, it's difficult to know where to start/stop. Hull values are probably among the most useful, but there are many others as well. I think we were hedging and waiting for official prefixes before going to a lot of effort to add these keys and then have to change them later.

ml-evs commented 2 years ago

Hi @ml-evs, thanks for creating this issue. We have discussed internally but haven't reached conclusions yet.

The most important metadata we need to add, I think, is our database version. Currently, our OPTIMADE endpoint is not giving information from our latest database version, which can cause a lot of confusion. In future, they should stay in sync, but I think this is still critical to communicate.

In terms of other properties, it's difficult to know where to start/stop. Hull values are probably among the most useful, but there are many others as well. I think we were hedging and waiting for official prefixes before going to a lot of effort to add these keys and then have to change them later.

Fair enough! Unfortunately the "official" prefixes will only really happen if there is demand/support from the databases that would implement them (maybe at the next workshop).

I think a bespoke _mp_hull_distance would still be super useful, provided it is not too much effort on top of your existing implementation (it should only require config changes once the data is in the corresponding document itself).

mkhorton commented 2 years ago

Yes, definitely, and I agree it's not a big ask :) I think the hull value is a sensible addition. We can probably add that regardless, but I would just like to establish a more longer-term strategy too rather than adding things on an ad-hoc basis.

Perhaps we can establish some conventions and pitch them at the next workshop.

ml-evs commented 2 years ago

Yes, definitely, and I agree it's not a big ask :) I think the hull value is a sensible addition. We can probably add that regardless, but I would just like to establish a more longer-term strategy too rather than adding things on an ad-hoc basis.

Perhaps we can establish some conventions and pitch them at the next workshop.

Awesome! Happy to help out with the use of the optimade package (just ping me on here). One ergonomics thing I am adding to the next python-tools release is the ability to add provider fields to the info endpoint using just the config file (https://github.com/Materials-Consortia/optimade-python-tools/pull/1096), which might be useful here.

ml-evs commented 2 years ago

Hi @mkhorton, just thought I'd ping this again as the workshop has now flown by!

We've made quite a few changes to optimade-python-tools that should make the custom field process much easier (see this bit of the docs - adding a custom field is now config-only, provided it already exists in your underlying database), and also some fixes for issues that effect the current MP-OPTIMADE implementation (one's caused by me not you!). For example, we finally have a workaround that lets us use the latest (slightly scuffed) version of FastAPI, so depending on optimade won't be holding you back.

Whilst there wasn't really any discussion of a hull distance field at the workshop (unfortunately very few of the databases that would use it were able to fully attend this year), it still seems like an obvious and very useful addition to me (it is currently stopping us from using MP-OPTIMADE in a data-driven project for live/in situ auto-XRD). We did also merge the fabled property definitions PR so the mechanism is now in-place for providers to come together and align property definitions without needing consensus from the whole consortium.

mkhorton commented 2 years ago

Thanks for the update @ml-evs! Tagging @munrojm and @tschaume here to make sure they see it.

wasn't really any discussion of a hull distance field at the workshop [...] it is currently stopping us from using MP-OPTIMADE in a data-driven project for live/in situ auto-XRD

This is interesting to hear. Is there a standard field name people are coalescing on yet? Do you have an example energy above hull property definition we could adopt via optimade-python-tools?

ml-evs commented 1 year ago

This is interesting to hear. Is there a standard field name people are coalescing on yet? Do you have an example energy above hull property definition we could adopt via optimade-python-tools?

Sorry for missing this at the time @mkhorton! I think the only examples we have so far are OQMD which uses _oqmd_stability and odbx (i.e., me) which uses _odbx_thermodynamics->hull_distance to provide the hull distance in eV/atom (and similarly _oqmd_delta_e. and _odbx_thermodynamics->formation_energy for formation energy per atom).

In terms of optimade-python-tools, you can just add the following to your o-p-t config:

"provider_fields": {
  "structures": [
    {
      "name": "energy_above_hull",
      "description": "The distance this structure lies above the convex hull of relevant phase diagram spanned by its constituent elements, with a given set of computational parameters.",
      "unit": "eV/atom",
      "type": "float"
    }
  ]
}

This will add this field metadata to the /info/structures endpoint of your API as _mp_energy_above_hull. If you are serving your OPTIMADE API off the same MongoDB as your main API, and you can access the underlying value with a flat (i.e., no $lookup or other aggregations) query on the document, then you can just add the alias:

"aliases": {
  "structures": {
    "energy_above_hull": "<database_field_name_for_hull_distance>",
  }
}

If the database access is more complicated than that, then you'd either have to write a custom MongoTransformer to do the appropriate plumbing, or inline the hull distance values in whatever collection you run the OPTIMADE API from, but based on the SummaryDoc stuff in the new MP API I would have thought this should be straightforward enough. If not, we can discuss adding support for aggregations based on the alias config in o-p-t (e.g., provide a field name and a collection name in which to do a $lookup).

In terms of other providers agreeing on a definition and field name, I guess that providers have to be seen as actually wanting to serve this data through OPTIMADE first, and I really hope they will, as many applications hinge on it!

janosh commented 1 year ago

@mkhorton I'd be happy to take this on. No time pressure of course if there are higher priorities atm. But if @munrojm or @tschaume can give me some guidance on what would need to be done, I think it'd be great to expose MP stability data through OPTIMADE.

tschaume commented 1 year ago

@janosh I agree it would be a nice addition. In terms of optimade, however, we should first bump optimade to the latest version (currently still running 0.13.3) which requires we work on adding optimade structures to our build stages first.

munrojm commented 1 year ago

@janosh I would be happy to take as much time as needed to give guidance if you are willing to take this on. @tschaume is right, though. It needs me to first get the optimade structures incorporated into our automated builds. I will move it up on my list and try and get it done in the next week. Shouldn't be too heavy of a lift.

janosh commented 1 year ago

@tschaume @munrojm Thanks for the fast replies! Sounds great. Happy to take care of the merge conflicts on https://github.com/materialsproject/devops/pull/594 (if someone gives me push access).

ml-evs commented 1 year ago

Great to see progress on this and more than happy to help out from our side, let me know if you run into any issues!