nomad-coe / nomad-simulations

A NOMAD plugin containing base sections for simulations.
https://nomad-coe.github.io/nomad-simulations/
Apache License 2.0
4 stars 1 forks source link

Simmetry class improvement #103

Open aalbino2 opened 1 month ago

aalbino2 commented 1 month ago

Hi, I was looking at Simmetry class and I believe that this is very useful also in synthesis experiments.

Checking the code with @JosePizarro3 we saw some aflow database info, they are currently not fetched through an API call but hardcoded in nomad.

How about implementing a way of fetching all the info directly from their database? Something similar to what is implement for a PureSubstance querying the PubChem database.

I could take a look into that if you think that it might be interesting @JFRudzinski @ndaelman-hu @Bernadette-Mohr

aalbino2 commented 1 month ago

@hampusnasstrom FYI

JosePizarro3 commented 1 month ago

There is a paper: https://www.sciencedirect.com/science/article/pii/S0927025614003322?via%3Dihub#s0050

But the question is whether we need to do any API request if someone (maybe @lauri-codes) already did this and hard-coded it in nomad.atomutils. API calls might clutter the normalization and extraction of the Symmetry info.

aalbino2 commented 1 month ago

Oki, I'm open to evaluate the best option

ndaelman-hu commented 1 month ago

@aalbino2 Thank you for bringing this up. My main question is what data you would want to retrieve?

Typically, we use API calls during the normalization to obtain identifiers that are interoperable with other systems. As @JosePizarro3 pointed out, API calls are a frequent bottleneck and point of failure, but this application is still manageable.

How about implementing a way of fetching all the info directly from their database [AFLOW]? Something similar to what is implement for a PureSubstance querying the PubChem database.

This is a larger data migration project that should be overseen by Markus. These migrations did happen in the past to grow out the NOMAD database. In that sense, they remain useful, and I actually requested such a migration for R2SCAN data a few times now.

However, these migrations also have a political dimension. The original host databases also apply for funding and rightfully don't want to divert traffic away from their site. I'd therefore say that it's best to coordinate these endeavors with Pepe.

Finally, there is already a consortium tackling a common API language where NOMAD and AFLOW both participate in, namely OPTIMADE. From an end user's perspective, the only added values of a data migration to NOMAD are:

aalbino2 commented 1 month ago

What I had in mind was something similar to what happens in PubChemPureSubstance, e.g., the user is filling one quantity within an instance of that class and on save the api query is filling the other quantities.

In this case the (experimental) user would log their own Sample and indicate the space group. On save, all the quyntities in Simmetry class would be automatically filled with info such as Bravais lattice, or whatever else that right now I'm now aware of.

I also see the Simmetry class as and important connection of data structure between A and C and apart from the API query I would in the future start to use it

JosePizarro3 commented 1 month ago

In this case the (experimental) user would log their own Sample and indicate the space group. On save, all the quyntities in Simmetry class would be automatically filled with info such as Bravais lattice, or whatever else that right now I'm now aware of.

To start, I suggest going with a function which uses the aflow_prototype dictionary in nomad, and given some quantity (e.g., the space_group_number), it extracts the others. This should cover also cases when someone fills the space_group_symbol or any other related quantities.

Then, we wait for the decision on the API. This method I am talking about do not depend on the specific reading of the information.

lauri-codes commented 1 month ago

This should not be done through an API. The information is so stable and small that it should be available offline. E.g. spglib contains a database of this information, but if it is too limited we should look at other alternatives or build our own.

lauri-codes commented 1 month ago

We also do have an offline version of AFLOW protototype library, but this is actually very different from spacegroup number as it also contains the occupation of Wyckoff positions and is associated with a very specific example structure.

JosePizarro3 commented 1 month ago

Ok, thanks @lauri-codes @aalbino2 !

I was checking a couple of other things regarding magnetic group symmetries over the weekend, so I see we have a couple of tasks to improve Symmetry. I think I can take over, and I will try to do it as general as possible so that we can use this in a generic way for data from any area.

For now, I will implement a method to resolve Symmetry quantities based on a given input and using spglib / aflow_prototypes from nomad.atomutils.

Later for magnetic symmetries, I want to get some feedback from other people as well. I will separate this in a different issue.

In any case, both implementations will take a while.

aalbino2 commented 1 month ago

Okay, sounds good @JosePizarro3. Just pin me when you need some feedback, I can test it and check if everything needed from my side is included.