xraypy / xraylarch

Larch: Applications and Python Library for Data Analysis of X-ray Absorption Spectroscopy (XAS, XANES, XAFS, EXAFS), X-ray Fluorescence (XRF) Spectroscopy and Imaging, and more.
https://xraypy.github.io/xraylarch
Other
127 stars 62 forks source link

Struct2xas: FDMNES/FEFF input files creation using Pymatgen and template files #458

Closed maurov closed 9 months ago

maurov commented 11 months ago

@newville @Ameyanagi @mretegan

With this request I propose to add the work done by @beatrizfoschi during her master project for generating FDMNES/FEFF input files using Pymatgen and templates. It is safe to merge, as we have made only additions, not modified the existing code.

For the FEFF part there is some overlap with the existing structure2feff, but we use FEFF not for the path expansion but to generate the xmu.dat.

The examples how this works are provided as a Jupyter notebook.

newville commented 11 months ago

@maurov @beatrizfoschi Awesome - thanks very much. I'm inclined to merge this. I'll give others a chance to comment on this, maybe planning to merge by the end of the week unless I hear objections.

Ameyanagi commented 11 months ago

@maurov @beatrizfoschi Cool job! I think it is an excellent feature to be added.

maurov commented 11 months ago

@newville @Ameyanagi thanks for your nice feedback!

I have converted this request as draft because @beatrizfoschi and @Certaingemstone have found some urgent bugs that need to be fixed before releasing. I think this could happen by end of August and have a new Larch release then.

newville commented 11 months ago

@maurov I think it is OK to consider every new feature as "draft" and somewhere between "alpha" and "beta" status.

If you think is ready for others to start using, I think that is OK to merge.

maurov commented 11 months ago

@newville I have realized that we need to change the way we provide atomic positions to FDMNES in the case of a simple CIF file. In fact, we are relying on the SpaceGroupAnalyzer from pymatgen (this is dictated by the fact that we want to build supercells and play with the atomic positions) even when we could simply take the atomic information from the CIF file and directly write them to the FDMNES input file.

This will stay on hold for a while, as I will be on vacation soon.

newville commented 11 months ago

@maurov for FDMNES input, it might be worth copying / refactoring cif2feff, which is generating a cluster of atomic x,y,z coordinates from the pymatgen structure (and includes randomized support for partial occupancy). It might be that this cluster is exactly what FDMNES wants too.

newville commented 10 months ago

The MP Legacy API does not work for me, even when I have a valid MP_API_KEY set....

Is it OK to move to the newer REST API?

newville commented 10 months ago

... but also, I really like the idea of being able to search the MP database -- perhaps we could add that to the XAS_Viewer GUI too (and hardwire in an API key).

newville commented 10 months ago

@maurov @Ameyanagi @mretegan Sorry for so many comments. I think this is really good, and we should definitely include this. For me, it raises several questions (below) those these do not need to be solved before merging this PR.

If you think this is ready for "initial release", I think we should merge this soon-ish (this week) and push out a new release. I think we can definitely say that we're adding support for generating input for FDMNES and this is a first release of that, ready for beta-testing, and maybe others will have good suggestions too.

Here are some thoughts:

  1. there is quite a bit of overlap here for "convert pymatgen Structure to (feff/fdmnes input file)" with code in cif2feff.py, and amcsd.py. Some of this code could/should probably be merged and cleaned up. And maybe we can then push some of that to pymatgen (which has Structure-to-FeffInp too, but I found it inadequate).
  2. I think it would be better to save the fdmnes / feff inputs to a 'fdmnes' or 'feff' folder under larch.site_config.user_larchdir (typically $HOME/.larch) than to put them into a tempfile location. That will allow the files to be browsed and used later.
  3. should we include binaries for fdmnes with Larch, and make something like "run_fdmnes()"? I would be +1 on this. I think Yves would not object.
  4. for downloading Structure from MaterialsProject, we should probably use the latest API.
  5. I would be OK with adding both "Generate FDMNES input' and "Search MaterialsProject" for CIFs from the wxlib CIF Browser used in XASViewer, if you thought that would be helpful. I would contact the MaterialProject folks about whether we can use one API key or expect every user to have their own.

I am not sure whether it is practical to try to address any of these before the next release. Any thoughts? Thanks!

Ameyanagi commented 10 months ago

I think it is good for the initial release.

When working on merging cif2feff.py, structure2feff.py, and struct2xas.py, it would be nice if we could leverage pymatgen ecosystem for reading structures. (e.a. Structure.from_str, Molecule.from_str) This will take care of proper file format handling, and we could make it as general as possible. However, this might take some time, and I don't think we can make it before the next release.

@newville I agree with 1-5 of your suggestions.

maurov commented 9 months ago

@newville

Thank you very much indeed for all your comments. I have been busy on the beamline lately and I did not have the time to work on this yet. I will reply to your more technical comments later. I mostly agree on them, but I prefer proceeding step by step.

For the next release (0.9.72) I also propose to merge this pull request, but I need to fix a bug first. In fact, we currently use Pymatgen SpaceGroupAnalyzer to get back to a symmetric cell for inputting the atomic coordinations in FDMNES (this approach is justified because we want to generate supercells too). The bug consists in the fact that, for some CIF files, we choose different equivalent positions, not strictly those written in the CIF file. In this case, FDMNES does not like it and fails with an error. The fix is to use the _get_atom_sites directly from the CIF file to cross check that we use exactly the atomic positions present in the CIF files when an unit cell with space group symmetries is used. OK, maybe all this is not very clear, but the message is: as soon as this is fixed (hope by the end of the coming week) I will remove the draft status and let you merge. Is that fine with you?

Once we have 0.9.72 released, we can work adding this new feature (generating FDMNES files) to the GUI, that is merging the redundant code going from a structure file (CIF, XYZ, whatever) to FEFF/FDMNES. Distributing the FDMNES executables in Larch is fine with me in principle, but I think it is better to let the user choose where the FDMNES executables are. This would go for 0.9.73 (or higher).

Only once we have this done, we could decide if it is worth pushing our approach to Pymatgen. Personally, I am reluctant in doing so, because then we will loose the versatility of changing the code rapidly, as Pymatgen is a bigger project. Having a separate code from Pymatgen was even the choice of Lightshow, which is closer to Pymatgen than us. I propose discussing this in one of the coming developers meetings.

newville commented 9 months ago

@maurov I am OK with waiting a bit, but I would rather get something out soon rather than wait for it to be perfect. ;)
Perhaps we can aim for Friday (Sept 29) but allow that to slip into the following week.

For coordinating Structural Data -> Input Files, yes, we should discuss, and maybe coordinate with or use/propose changes to the Lightshow efforts.

And we should start having developer Zoom calls!

maurov commented 9 months ago

This all looks good to me. I vote to merge in. Except: the Jupyter example fails for me in setting up and running the Feff calculation. It looks to me like mat_obj.outdir gets overwritten from FDMNES to FEFF calc. I might suggest adding mat_obj.fdmnes_dir = mat_obj.outdir to cell 8 and mat_obj.feff_dir = mat_obj.outdir to cell 10 and then using those to reference those results. Like, the "run feff" cell should use

feff_inp = f"{mat_obj.feff_dir}/feff.inp"

otherwise, feff fails to run for me.

This is fixed in https://github.com/xraypy/xraylarch/pull/458/commits/eea0dfb1349ab8c0beab285a349b184048ee29b4. I was using mat_obj.parent_path instead of mat_obj.outdir. In fact, this last one differentiate feff from fdmnes paths.

  1. I think it would be better to save the fdmnes / feff inputs to a 'fdmnes' or 'feff' folder under larch.site_config.user_larchdir (typically $HOME/.larch) than to put them into a tempfile location. That will allow the files to be browsed and used later.

Yes, if this is a good option and can be customized later on (for example within the GUI). For the sake of the Jupyter notebook examples, I preferred using temporary paths. Furthermore, if one wants to run a full batch of simulations in the order of thousands or more, it is preferable to use a temporary space on a cluster rather than the user home directory.

maurov commented 9 months ago

@newville I do not have the time now implementing the changes I wanted to. I propose merging like this and release as beta feature.

newville commented 9 months ago

@maurov OK, merging this.... Great!!

newville commented 9 months ago

@newville @Ameyanagi just to let you know, after merging, I made a couple of changes to this:

a) I moved all of the use of tempfile to build folders for feff/fdmnes files to point to feff and fdmnes folder under site_config.user_larchdir (typically $HOME/.larch). That way the files generated will be more easily found after the fact, and the OS is unlikely to automatically delete them.

b) I also changed to the new mp_api. This is mostly transferrable, but there is not a "CIF" property returned. So, I left save_cif_from_mp as is (using older mp_api), and added a new function save_mp_structure which saves a JSON-dump of an MP Structrure. These will be saved to the mp_structures folder (in site_config.user_larchdir). I then also added the ability to read these back in, renaming the structure-reading method to read_structure().

aside / data-formatting: It is sort of interesting that the JSON dump of the MP Structure is basically a complete replacement for CIF. That is, we read CIF to generate the MP Structure. But that Structure is sort of what we really want. And those are readily dumped to / loaded from JSON.

newville commented 9 months ago

@maurov @Ameyanagi ... but also, with those changes to both the code and the Jupyter notebook, the notebook runs without any errors for me. ;)