Open jkshenton opened 8 months ago
Hi @jkshenton thanks a lot for getting in touch. We will happy to implement this.
Just to speed up the development, do you have any input/output example files at hand that we can directly have a look? I tried looking but couldn't find these examples. Furthermore, the more complete folder / files you can share with us, the better, as we can prepare the parsing and cross-reference with ASE :-)
Thanks for the super fast reply and for taking up this implementation!
CCP-NC has a repository of many thousands of such .magres files: https://www.ccpnc.ac.uk/database/
I grabbed a few random examples from there are put them in the attached tarball. The ethanol.magres
file is a particularly comprehensive example.
Let me know if you would like any more information to make the implementation easier.
Perfect, thanks a lot for the share.
This is very interesting, we were not aware of such an initiative. I will take a look on the details of the project and possibly come back to you with some questions, if that's fine. We can further discuss whether you want to share the data in the database in NOMAD, and how can we help each other with computational or experimental data.
We have recently been thinking about ways to make a version 2 of our NMR database more FAIR, including some integration/sharing with databases such as the NOMAD one, so we would be very happy to discuss this!
Hi @jkshenton ,
I am coming back to this issue to let you know we are starting to work on the magnetic properties support in NOMAD (you can check a recent issue opened in #174).
I think before starting to work on the magres parser, it is a good idea if we can meet in Zoom, let's say 30min - 1h, so that we can understand your goals, how to merge the #174 idea with yours, and how NOMAD can help. Furthermore, I would like to discuss the workflows typically done in magres calculations, how to integrate this, and how is the data in the CCP-NC structured (and how it compares with NOMAD).
What do you think? We can also talk by email (jose.pizarro@physik.hu-berlin.de) and organize the meeting by private email. Whatever feels more comfortable for you π
Happy to meet and discuss our goals - I've just sent an email to arrange that.
A bit more context here to help with a discussion:
magres
files are a structured text file format that contains primarily a) a crystal structure and b) NMR-related results
The NMR-related results can include site-based (e.g. magnetic shielding and electric field gradient) tensors, pair-wise (e.g. J-couplings) tensors or global (e.g. magnetic susceptibility) tensors. Each quantity is reported along with its units.
The file can also have a [calculation]
block that contains some metadata about the DFT parameters (e.g. XC functional) used.
Although a magres file in isolation is very useful for post-processing and sharing the results of first-principles solid-state NMR calculations, we would ideally like to provide more context in our (/your) database in the future. The workflow would typically be something like:
In terms of our goals: we're currently in the planning stage of a major re-development of our database stack and we're looking at different options to improve the value the database provides to the solid-state NMR community. This includes better search/filtering functionality, better metadata capture (including workflow context) and some data visualisation options through integration of some of our other python and javascript tools.
Very good. I think the workflow can be covered with the current NOMAD infrastructure, albeit some details we can discuss over Zoom (like which files should be included in the upload for these).
In terms of our goals: we're currently in the planning stage of a major re-development of our database stack and we're looking at different options to improve the value the database provides to the solid-state NMR community. This includes better search/filtering functionality, better metadata capture (including workflow context) and some data visualisation options through integration of some of our other python and javascript tools.
Then, FAIRmat can help on this. I am speaking internally with some engineers to see whether they can join the discussion. But for a first meeting, we can definitely sit and see what are the best options; maybe, @ladinesa are you available for joining the discusion? If so, I will send you the emails for the Zoom.
Thanks for including me in the discussion. Is the date already set? I will be on holiday next week, so it would be great if we schedule it thia week.
Brief summary of our meeting:
system
, method
) and outputs (calculation
),
2) link between magres and QuantumESPRESSO/CASTEP if the files are present in the datapoint,
3) add searchability (which methodological strings or numerical quantities can be defined?), defining properties.magnetic
in NOMAD.
4) add app menu in Explore tab in NOMAD for NMR data. What about experimentalists? Can we convince some groups to use NOMAD?
5) add visualization, probably based on MagresView,
6) update the CCP-NC page with these changes (see previous point).I think these bullet points summarize the meeting. Feel free to add or ask anything.
Thanks for sharing your summary! I think you captured the essential bits.
For the magresview visualiser, I would rather link to our custom "2.0" version which essentially completely replaces the previous JMOL-based version.
For the workflow / link between different DFT output files, I've attached a tarball with a very basic two step procedure that might be typical of the sort of ssNMR calculations with CASTEP that one might upload to NOMAD: 1. a geometry optimisation (seedname ethanol_geom
) followed by 2. an NMR calculation (seedname ethanol_nmr
). The latter produces a .magres file.
castep_workflow_nmr.tar.gz
Thanks a lot, this is indeed what is needed to fully develop the parser ππ» If you have more examples, do not hesitate in sharing them with us; the more, the better, as this will help on preparing better other options.
Now, @jkshenton @ladinesa I was wondering about the workplan: I think, we (either Alvin or myself) can develop the initial version of the parser. Then, on the long term and if you are convinced of using NOMAD, it is better if you (or Sathya) take over maintaining the parser. I was very recently discussing with other devs, and you could even think in the more longer term about using the developed parser as an I/O wrapper for your applications (without the need of having scripts over the place).
Let me know what you think. If agreed, I'll suggest you to star this repository, and I will keep you informed of important changes that affect you.
P.S.: should we also tag Sathya's Github profile?
Your proposed workplan sounds good to me - thanks!
As we mentioned before, the broader context would be that we would like to be able to easily (=via dashboard/API) access NMR data from any of the DFT codes that compute it. These include (non-exhaustive list):
Parsing magres files is a very useful first step towards this, since they have been adopted by two of the major DFT NMR codes (CASTEP and QE) and the specification for the file format introduces the rationale behind the structure of key bits of NMR data. There's also an accompanying JSON schema , in case that is helpful.
In terms of using the nomad parser as an I/O wrapper - I am all for re-using code and well-built libraries, though I would note the ongoing development of a standalone CASTEP parsing library to play such a role: https://github.com/oerc0122/castep_outputs The idea behind that one is that it will be eventually integrated with the CASTEP code test suite/CI workflow and thereby (hopefully) maintained by the CASTEP developers.
Good idea to tag @Sathya-S3
Very good. I will work on the schema and parser mid December. Sorry, I am going on holidays two weeks.
In terms of using the nomad parser as an I/O wrapper - I am all for re-using code and well-built libraries, though I would note the ongoing development of a standalone CASTEP parsing library to play such a role: https://github.com/oerc0122/castep_outputs The idea behind that one is that it will be eventually integrated with the CASTEP code test suite/CI workflow and thereby (hopefully) maintained by the CASTEP developers.
This is very interesting. We have to definitely join efforts here, as I don't see the point of maintaining several parsers for the same code and double the work π We will pay attention to when this is integrated in CASTEP, but in the meanwhile, @ladinesa do you mind checking the repo and seeing how it compares with our current CASTEP parser?
Very good. I will work on the schema and parser mid December. Sorry, I am going on holidays two weeks.
In terms of using the nomad parser as an I/O wrapper - I am all for re-using code and well-built libraries, though I would note the ongoing development of a standalone CASTEP parsing library to play such a role: https://github.com/oerc0122/castep_outputs The idea behind that one is that it will be eventually integrated with the CASTEP code test suite/CI workflow and thereby (hopefully) maintained by the CASTEP developers.
This is very interesting. We have to definitely join efforts here, as I don't see the point of maintaining several parsers for the same code and double the work π We will pay attention to when this is integrated in CASTEP, but in the meanwhile, @ladinesa do you mind checking the repo and seeing how it compares with our current CASTEP parser?
will create interface to it in #184 .
Hi @ladinesa, I'm in the process of preparing a technical stack review document for the CCP-NC main working group. The goal is to present the different development options for the CCP-NC database website. It'd be valuable to know your thoughts as well, on the below section from @JosePizarro3's meeting notes, when time permits. Thank you very much.
- The main goal is to improve the website for CCP-NC NMR database: FAIR-compliant metadata, improved searchability (system information and NMR properties searches), and including visualizations.
- There are several options, but without wanting to re-invent the wheel, we talked about using NOMAD as a platform for the FAIR metadata and searchability, and use the CCP-NC website for front-end. @ladinesa how do you envision this point? How should the CCP-NC web be used and be compatible with the central NOMAD?
For reference @jkshenton
Hi @ladinesa, I'm in the process of preparing a technical stack review document for the CCP-NC main working group. The goal is to present the different development options for the CCP-NC database website. It'd be valuable to know your thoughts as well, on the below section from @JosePizarro3's meeting notes, when time permits. Thank you very much.
The main goal is to improve the website for CCP-NC NMR database: FAIR-compliant metadata, improved searchability (system information and NMR properties searches), and including visualizations.
There are several options, but without wanting to re-invent the wheel, we talked about using NOMAD as a platform for the FAIR metadata and searchability, and use the CCP-NC website for front-end. @ladinesa how do you envision this point? How should the CCP-NC web be used and be compatible with the central NOMAD?
For reference @jkshenton
I refer to the approach we took with the other databases supported in nomad e.g. materials project, aflow, oqmd. We would host your data in nomad and develop an app for a customised search of nmr data in central nomad. Regarding the ccp-nc website, you start with a nomad oasis deployment where you can further customise schema, visualisation etc. This will also enable the synching of data with nomad central. Depending on the long-term goals of the project, you can then migrate into an independent infrastructure similar to the databases I have mentioned providing only a link to the corresponding entry in nomad.
Hi @jkshenton @Sathya-S3
Just wanted to say that I am almost finished with the initial version of the parser for magres. Just had a couple of minor doubts:
efg
and isc
can be partition into different contributions with an extra tag that will appear as efg_{tag}
and isc_{tag}
. I wanted to know whether there are other options than "local" and "nonlocal" for the potential, and "fc", "orbital_p", "orbital_d" and "spin" for the spin couplings. I guess not, but I want to make 100% sure.xcfunctional
into magres? I have some potential settings for the XC functional in CASTEP and QE, but just want to know whether we are in the same page here. I can share these in detail.Thanks!
Hi! Exciting - thanks for working on it!
calc_xcfunctional
tag (we just store the lines as strings), but as I understand the CASTEP source, the first 'word' in the full xc_definition
is what gets printed in the magres file. For QE, up until very recently, there was no XC functional information in their magres files. However, newer ones will have this information, following this commit. So their calc_xcfunctional
will be the result of their get_dft_short()
routine. Hope that at least partially answers your questions (?).
Great, thanks a lot. Let's then put the focus first on CASTEP, test it, and then extend the support for QE if you like it.
.castep
output and then do some mapping. Thus for CASTEP magres reads the input .param
file and prints to the [calculation] block reading the first word? Thanks once more! π
Hi @jkshenton @Sathya-S3 ,
I finished preparing a magres parser. I included the parsing of the quantities in your file format, and I managed to connect with the CASTEP i/o files if these are present in the upload.
I think it makes sense if you can check, with some examples, if the parser works as you think it should. Then, we can set up another meeting to tackle more seriously how to integrate this parsing into your database. From my side, I think the best would be to have for your database CCP-NC to be the front-end of whatever is stored in NOMAD from NMR, but I would be happy to hear your thoughts.
@jryates Further to my email earlier today, I'm tagging you in this magres parser development thread to help move the conversation forward.
best wishes, Sathya.
Addressing a few comments further up the thread:
Thank you for your work on including the magres parser to NOMAD. The magres parser looks and works seamlessly. The parsing speed was quite quick, it took only a couple of seconds for each upload. We tested the magres parser with a few sample magres uploads (test upload, but not published) - one special inorganic material 'wadsleyite' and a well-known inorganic material 'coesite' (where we tested two variations of symmetry information in the magres file).
Comments and questions from testing
Many thanks in advance.
EDIT: Attaching the magres files we used for the test, for your reference. magres_parser_check.zip
@jryates @Sathya-S3
Thank you very much for testing the changes and giving feedback. Also, sorry for the long reply, I would like to comment 3 main things which directly affect you.
NOMAD will become more modular, so that people can develop independent packages (or plugins) and use them in their own installations or in the central one after approval. This means that:
magres
parser to its own repo. I think this will be better overall when checking on the latest changes, and maybe you find it interesting to develop as well π About the symmetry, NOMAD uses a package called MatID to classify and extract symmetry information. Am I understanding correctly that the symmetry was extracted properly by NOMAD, or due to missing these pieces of information, was it not?
3.1. I noticed that magnetic shielding values from magres files are scaled by 10^-6 because the unit is ppm, which is fine. For efg, the magres file values were scaled by ~0.01028 to convert from atomic units to V/Γ ^2 - is it the standard unit for representing similar parameters in the 'electronic' category in NOMAD? May I know, if we try to export the values back to a magres file, will these values be converted back to the same units represented in magres files (I don't think I can test this without publishing the dataset)?
So the units in NOMAD are defined based on the S.I., and we handle Quantities following pint. Units can be then changed by multiplying with ureg.<desired_unit>
.
You can test your uploads using NORTH in NOMAD. This allows you to launch a Jupyter notebook directly in a folder where you can find your uploaded data. Maybe @ladinesa can tell you the exact details on importing the MagresParser
and use the parse()
function in there.
3.2. The tensor representation in NOMAD of the ms and efg parameters is transposed. Could this be changed or is there a legacy reason why it is presented that way?
You are totally right, thanks for spotting this. It is clearly a mistake from my side, I will fix it asap π
3.3. For the [nx3x3] tensors for ms and efg, the data labels in the 'value' section of DATA go from 0 to n-1. It is not easy to identify which atoms they correspond to without having the magres file open by the side. Is it possible to display the actual atomic labels as 'H1', 'O5' to indicate the first H atom, the fifth O atom, etc., for example?
Very good point. However, as we work with pint.Quantity
, it is not possible to define strings and floats at the same level. There are tho a couple of alternatives we can explore:
magnetic_shielding.atom_labels
(and all the other NMR quantities) which contains the atoms to which the first index makes reference. I can also improve the description.magnetic_shielding
and the others become a list of 3x3 tensors. For each element, we have magnetic_shielding[i].value
, magnetic_shielding[i].isotropic_value
, magnetic_shielding[i].atom_label
(note the singular in "label"). This is more long term in the sense that, we can even improve on the atom_label
with the new data schema I mentioned above.You can let me know what you think. A screenshot or demo of option 2 might be better to fully get the idea π
We should maybe meet and talk of solutions to work from both databases. I have some feedback from other NOMAD devs, and I think we can talk very nice options.
Let me know if you want to meet, and when.
Hi @JosePizarro3, thank you for the detailed responses to our questions and additional new information on NOMAD platform's development direction.
- We are on the process of refactoring the current NOMAD data schema (the sections and quantities you checked on the DATA menu). This is mainly because it became quite cumbersome to modify and maintain certain steps. You can find the new schema being developed in its own Github repo, but please, bear in mind that this is in a pre-alpha stage. We calculate that by April/May, there should be an initial version of the new data schema.
We'll keep watching the link for updates.
- Related with this, parsers are going to be soon moved to its own independent repos. I will let you know once we move the magres parser to its own repo. I think this will be better overall when checking on the latest changes, and maybe you find it interesting to develop as well π
Yes, definitely. We look forward to directly being involved in further parser development.
Am I understanding correctly that the symmetry was extracted properly by NOMAD, or due to missing these pieces of information, was it not?
Yes it was extracted correctly, even when we deliberately entered incomplete symmetry information in the magres header. I think @jryates' and my comment really was that magres file's symmetry information is ignored my NOMAD. Down the line, it might be desirable to use magres symmetry information to calculate symmetry as an extra validation check?
You can test your uploads using NORTH in NOMAD. This allows you to launch a Jupyter notebook directly in a folder where you can find your uploaded data. Maybe @ladinesa can tell you the exact details on importing the MagresParser and use the parse() function in there.
I'm keen to test this further and will set some time aside for this. I'll wait first to see if @ladinesa has more information to add as you suggest.
Your ideas for the atom labels, both short and long term, sound good. Please let me know if I can be of help either with the development or by providing periodic feedback during development.
I have a positive response from the CCP-NC working group to proceed talks with NOMAD about our partnership. I'll prepare a list of technically focussed questions surrounding CCP-NC data in a NOMAD supported database. My colleagues from Physical Sciences Data Infrastructure (PSDI) also have data-centric and logistical questions of their own to add. We'll aim to get these questions to you within a week's time.
I'm aiming to arrange a sit-down between your team and us (me + PSDI) in the first instance, some time next week. If your team members are on holiday next week (around Easter time), we can aim to block a time slot for the week after. We can deal with the meeting specifics through email.
Yes it was extracted correctly, even when we deliberately entered incomplete symmetry information in the magres header. I think @jryates' and my comment really was that magres file's symmetry information is ignored my NOMAD. Down the line, it might be desirable to use magres symmetry information to calculate symmetry as an extra validation check?
Ok, that sounds good. But I need to understand a bit better how to do this validation, and how the symmetry operations compare with the MatID. Give some days, and perhaps I will even write you some email with more specific questions.
And perfect about the positive response π₯³ I am happy we can further collaborate and improve both NOMAD and the CCP-NC. I will let some colleagues know once you send me the questions and invitation for the Zoom, it might require some other expertises that @ladinesa or I do not have π
Just a short follow-up:
.magres
files are output by both Quantum Espresso's GIPAW and CASTEP when NMR calculations are run. It would be great to have NOMAD be able to parse them.A parser already exists in ASE, for reference: https://gitlab.com/ase/ase/-/blob/master/ase/io/magres.py
The specification of the file format exists here: https://www.ccpnc.ac.uk/docs/magres/magres-format.pdf