OpenMX parser: parse more MD details

ondracka commented 10 months ago

Just some needed fields so that the new GUI features like the temperature evolution graph are populated correctly.

That said I was not sure how to correctly populate the thermostat details. I just parse the type, but right now it seems it assumes the thermostat is always keeping constant temperature over the whole run, which is not the case at least in the testcases I have. What I have access to from the OpenMX output is either the info about how the target temperature should be at specific timestep (and its linearly interpolated in between), MD trajectory file also actually contains the target interpolated temperature at every timestep. But as I said I was not able to figure out how to put it into the current metainfo.

JosePizarro3 commented 10 months ago

Hi @ondracka,

I think it is a good idea to tag @JFRudzinski on this, as he is responsible on extending the MD metainfo and he will happy to help you 🙂

JFRudzinski commented 10 months ago

Hi @ondracka

Sorry for the delayed response.

I am not sure if I understand exactly what you are looking for. It sounds like you want to store the actual system temperatures during a simulation? This is done via archive->run->calculation->temperature, e.g., See HERE

We also have some basic support for storing the parameters for a molecular dynamics simulation. This goes under workflow2->method. Here is an example:

{
    "molecular_dynamics": {
        "thermodynamic_ensemble": "NPT",
        "integrator_type": "langevin_leap_frog",
        "integration_timestep": {"value": 2e-15, "unit": "ps"},
        "n_steps": 20000000,
        "coordinate_save_frequency": 10000,
        "velocity_save_frequency": null,
        "force_save_frequency": null,
        "thermodynamics_save_frequency": null,
        "thermostat_parameters": {
            "thermostat_type": "langevin_leap_frog",
            "reference_temperature": {"value": 300.0, "unit": "kelvin"},
            "coupling_constant": {"value": 1.0, "unit": "ps"}},
        "barostat_parameters": {
            "barostat_type": "berendsen",
            "coupling_type": "isotropic",
            "reference_pressure": {"value": [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]], "unit": "bar"},
            "coupling_constant": {"value": [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]},
            "compressibility": {"value": [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]}
            }
    }
}

I am happy to help further. Just let me know what exactly your calculation/simulation is or what quantities you want to store. It may be the case that we need to extend the metainfo, which is a good thing since we want to start branching this out to cover more scenarios.

We could also Zoom next week sometime if you want for an easier discussion, just let me know here or on the FAIRmat slack if you are there or at joseph.rudzinski@physik.hu-berlin.de.

ondracka commented 10 months ago

Thanks for the response, the actual system temperature is already parsed fine, I was looking into the thermostat metainfo, like if I have a NVT run with non-constant temperature, for example something like this is possible with OpenMX: https://www.openmx-square.org/openmx_man3.9/node60.html (but similar temperature profiles can be simulated also with LAMMPS etc.). But from your example it looks like the thermostat metainfo is assuming the temperature is always constant for the whole run (that was also my initial impression).

Other than that I believe everything is clear so the only help I need here is a review and eventual pull ;-)

ondracka commented 10 months ago

@ondracka Overall it looks good to me.

As you will see in my comments, I am addressing your issue with the thermostat_type.

I also went ahead and added repeats to the thermostat and barostat sections, along with quantities frame_start and frame_end. This way you can add something like a simulated annealing protocol, where you change the target temperature throughout the simulation. See also here: https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/merge_requests/1466

Can you let me know if this would suit your needs? Once you confirm, I will merge and then you will need to simply make thermostat_parameters a list. I would suggest then using:
            workflow = MolecularDynamics(
                method=MolecularDynamicsMethod(
                    thermostat_parameters=[ThermostatParameters()],
                    barostat_parameters=[BarostatParameters()]
                ), results=MolecularDynamicsResults()
            )
to define the workflow section instead of using m_create()

Yeah, I think this should alow to store any crazy temperature profiles whatsoever. However I'm not sure I follow regarding not using the m_create(). For example if I have a simulated heating from 300K to 400K with a constant heating rate over a 100 timesteps (so 1K/timestep), it means I'll have thermostat_parameters section repeat 100 times (each one with the actual interpolated target temperature for the specific MD step and where frame_start and frame_end will be the same single frame where this specific target temperature is relevant.) But I still have only one workflow, so I still need to use m_create() for the thermostat_parameters sections don't I?

ondracka commented 10 months ago

Thanks for the comments, I'll send an updated PR with the comments resolved when https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/merge_requests/1466 is merged

JFRudzinski commented 10 months ago

@ondracka Overall it looks good to me. As you will see in my comments, I am addressing your issue with the thermostat_type. I also went ahead and added repeats to the thermostat and barostat sections, along with quantities frame_start and frame_end. This way you can add something like a simulated annealing protocol, where you change the target temperature throughout the simulation. See also here: https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/merge_requests/1466 Can you let me know if this would suit your needs? Once you confirm, I will merge and then you will need to simply make thermostat_parameters a list. I would suggest then using:
            workflow = MolecularDynamics(
                method=MolecularDynamicsMethod(
                    thermostat_parameters=[ThermostatParameters()],
                    barostat_parameters=[BarostatParameters()]
                ), results=MolecularDynamicsResults()
            )
to define the workflow section instead of using m_create()
Yeah, I think this should alow to store any crazy temperature profiles whatsoever. However I'm not sure I follow regarding not using the m_create(). For example if I have a simulated heating from 300K to 400K with a constant heating rate over a 100 timesteps (so 1K/timestep), it means I'll have thermostat_parameters section repeat 100 times (each one with the actual interpolated target temperature for the specific MD step and where frame_start and frame_end will be the same single frame where this specific target temperature is relevant.) But I still have only one workflow, so I still need to use m_create() for the thermostat_parameters sections don't I?

In that case you could do something like:

            n_targets = 100
            thermostat_section = [Thermostat_Parameters() for _ in range(n_targets)]
            workflow = MolecularDynamics(
                method=MolecularDynamicsMethod(
                    thermostat_parameters=thermostat_section,
                    barostat_parameters=[BarostatParameters()]
                ), results=MolecularDynamicsResults()
            )

            # populate
            for i_thermo in range(n_targets):
                workflow.method.thermostat_parameters[i_thermo].reference_temperature = xxxx

However, now that you gave this example, I think I still want to improve the metainfo. For some reason I wasn't really thinking about a continuous protocol. I am thinking that we should also give the option to simply give the starting/stopping frames and then also provide a type of interpolation (e.g., linear), and the interpolation rate. Then you could store everything in a single section if you want (we will still allow multiple sections for more complicated protocols). What do you think? Are there other quantities that I should define?

ondracka commented 10 months ago

Most codes I know can do only a linear heating/cooling between starting and ending temperature (this is also what LAMMPS has https://docs.lammps.org/fix_nh.html or equivalently starting and ending pressure for the barostat or IIRC also VASP has similar syntax with TEBEG and TEEND).

AFAIK Qunatum ESPRESSO sets starting temperature and instead of the final temperature it sets how often and how is the target temperature changed (either by adding some constant value which would be equivalent to the linear interpolation, except one would have to calculate the final temperature by hand based on the number of steps and the increment however it also allows to rescale/multiply the current target temperature by fixed amount which would be equivalent to exponential heating cooling profile https://www.quantum-espresso.org/Doc/INPUT_PW.html#delta_t and https://www.quantum-espresso.org/Doc/INPUT_PW.html#nraise)

So I think your suggestion would work for me (and is much more breve than the initial suggestion) however what you proposed initially is probably most generic.

JFRudzinski commented 10 months ago

Most codes I know can do only a linear heating/cooling between starting and ending temperature (this is also what LAMMPS has https://docs.lammps.org/fix_nh.html or equivalently starting and ending pressure for the barostat or IIRC also VASP has similar syntax with TEBEG and TEEND).

AFAIK Qunatum ESPRESSO sets starting temperature and instead of the final temperature it sets how often and how is the target temperature changed (either by adding some constant value which would be equivalent to the linear interpolation, except one would have to calculate the final temperature by hand based on the number of steps and the increment however it also allows to rescale/multiply the current target temperature by fixed amount which would be equivalent to exponential heating cooling profile https://www.quantum-espresso.org/Doc/INPUT_PW.html#delta_t and https://www.quantum-espresso.org/Doc/INPUT_PW.html#nraise)

So I think your suggestion would work for me (and is much more breve than the initial suggestion) however what you proposed initially is probably most generic.

ok, great. Please check what I added here: https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/blob/1684-new-thermostat-metadata-needed-for-openmx-parsing-of-md-data/nomad/datamodel/metainfo/simulation/workflow.py

If you have any suggested adjustments to naming or descriptions, please let me know.

Also, I forgot to mention before, you should add tests for all your changes, I did not see any during the review. Then, you can also add tests for this new metainfo, which is needed.

ondracka commented 10 months ago

It looks fine (that said my opinion has no weight whatsoever), however right now one would need to do some recalculations in the parsers to fill this (from starting and ending temperature to the annealing_delta in OpenMX and LAMMPS, etc..). Aren't the parsers supposed to just get what is in the output files and the rest should be done in normalizing?

So it might be cleaner to add both reference_temperature_start (this would be also the reference temperature for the constant profile) and reference_temperature_end. Annealing_type I would rename to temperature_profile and I would add also "constant" option for the simple case. Also frame_end and frame_start should be valid also for the constant temperature (in lot of cases users do thermalizing at constant temperature first and than some heating/cooling). Also if the annealing_type is renamed than the others could also be renamed to something like temperature_update_frequency, temperature_update_delta/increment, temperature_update_scaling_factor, etc...

So this would allow for every parser to just parse its native inputs, like

constant T only
Tstart frame_start Tend frame_end combination
Tstart + temperature_update_increment + temperature_update_frequency, etc...

And than normalizer would later recalculate it to fill the other fields like reference_temperature_end or the increment/factor (and potentially to calculate specific target temperature/pressure at specific timesteps, so it could be potentially plotted together with the actual values in the "Thermodynamic properties" box on the entry page similarly to the OpenMX manual plots I referenced earlier).

JFRudzinski commented 10 months ago

And than normalizer would later recalculate it to fill the other fields like reference_temperature_end or the increment/factor (and potentially to calculate specific target temperature/pressure at specific timesteps, so it could be potentially plotted together with the actual values in the "Thermodynamic properties" box on the entry page similarly to the OpenMX manual plots I refere

Great, thank you for the suggestions! I like your naming, I wasn't happy with annealing but wasn't sure what to call it.

The only part I disagree with is the last part about the Thermodynamic properties. What we are talking about are parameters that are part of the MD workflow. The temperature quantity in Thermodynamic properties is the actual simulation temperature (i.e., calculated via the velocities), not the target temperature. That being said, in the future we could have plots or widgets for plotting quantities for common workflows, and in this case we could have something like what you are talking about.

Also, indeed as you said, we need to do a lot of reprocessing and, in the case of Lammps, a lot of improvements in converting the native quantities to the normalized metainfo. We are working on this little by little depending on demand essentially.

Anyway, this is great, I will make these changes and let you know when everything is merged.

JFRudzinski commented 9 months ago

@ondracka Sorry for the delay! It turned out we had to fix a couple things for the new metainfo to work properly. I just merged the NOMAD branch mentioned above into develop. Let me know if you need anything else.

ondracka commented 9 months ago

@ondracka Sorry for the delay! It turned out we had to fix a couple things for the new metainfo to work properly. I just merged the NOMAD branch mentioned above into develop. Let me know if you need anything else.

Thanks, I'll try to rebase tomorrow.

ondracka commented 9 months ago

This is now rebased on top of the thermostat metainfo changes.

ondracka commented 9 months ago

Hm, I see some test failures, it looks like the some other parsers (at least FHIaims) need to be update to the thermostat_parameters metainfo changes (the OpenMX tests all pass properly locally).

JFRudzinski commented 9 months ago

Hm, I see some test failures, it looks like the some other parsers (at least FHIaims) need to be update to the thermostat_parameters metainfo changes (the OpenMX tests all pass properly locally).

Ah yes, this should be very easy, we just need to change the thermostat section to a list. I opened an issue here: https://github.com/nomad-coe/electronic-parsers/issues/167, However, I am helping to run a workshop this week and may not have time to fix it. I would say you can move forward with this PR though, and then I will fix these tests when I return.

ondracka commented 9 months ago

@ladinesa @JFRudzinski I need someone to push the merge button here, I believe all the issues were resolved, but I don't have the rights myself.

JFRudzinski commented 9 months ago

@ondracka sorry about the delay! I hope everything is working for you now.

ondracka commented 9 months ago

@ondracka sorry about the delay! I hope everything is working for you now.

Np, all is fine, thanks for the review.

nomad-coe / electronic-parsers

OpenMX parser: parse more MD details #158