Issue with Different Crossing Points in Population Graph

sharc-md / sharc

The SHARC molecular dynamics (MD) program suite is an ab initio MD software package developed to study the excited-state dynamics of molecules.

https://www.sharc-md.org

GNU General Public License v3.0

70 stars 38 forks source link

Issue with Different Crossing Points in Population Graph #129

Closed lukhman9020 closed 2 months ago

lukhman9020 commented 5 months ago

Dear Sir, I have completed running trajectories in the SHARC-LVC interface and then diagnosed each trajectory using the $SHARC/diagnostics.py script. The problem I am encountering is that I get different triplet and singlet crossing points in the population graph.

Specifically, I obtained the triplet and singlet crossing points around 30000 fs when I followed the diagnostic settings as per the current settings values.

However, when I changed the current settings values , I got the singlet and triplet crossing points around 22000 fs.

Could you please assist me in understanding the cause of this discrepancy?.

Looking forward to your guidance.

Best regards, LUKHMANUL HAKEEM K.

maisebastian commented 5 months ago

Dear lukhman9020, I am not completely sure what you are referring to with "triplet and singlet crossing points" - Do you simply mean the point in time when the singlet and triplet population becomes equal? Normally, "triplet and singlet crossing points" are geometries where the energy of a singlet and a triplet state coincide.

Regarding your discrepancy, it seems that with the default thresholds in diagnostics.py, several trajectories were filtered out due to various possible reasons. Best practice in SHARC is to manually inspect the trajectories that diagnostics.py flags. Only then can you understand what causes the trajectories to be rejected. This knowledge allows you to judge whether the trajectories should be included in the analysis nonetheless, whether they should stay excluded, or whether the entire project should be repeated with improved settings.

What you write about the "time of singlet-triplet crossing" seems entirely reasonable to me. Excluding part of the trajectory swarm can easily change your results, especially if you only have a small number of trajectories and/or if you are excluding a large fraction of the swarm. Long time constants are especially prone to large errors and require many trajectories and long simulations to get accurate values.

Best, Sebastian

lukhman9020 commented 5 months ago

Dear lukhman9020, I am not completely sure what you are referring to with "triplet and singlet crossing points" - Do you simply mean the point in time when the singlet and triplet population becomes equal? Normally, "triplet and singlet crossing points" are geometries where the energy of a singlet and a triplet state coincide.

Regarding your discrepancy, it seems that with the default thresholds in diagnostics.py, several trajectories were filtered out due to various possible reasons. Best practice in SHARC is to manually inspect the trajectories that diagnostics.py flags. Only then can you understand what causes the trajectories to be rejected. This knowledge allows you to judge whether the trajectories should be included in the analysis nonetheless, whether they should stay excluded, or whether the entire project should be repeated with improved settings.

What you write about the "time of singlet-triplet crossing" seems entirely reasonable to me. Excluding part of the trajectory swarm can easily change your results, especially if you only have a small number of trajectories and/or if you are excluding a large fraction of the swarm. Long time constants are especially prone to large errors and require many trajectories and long simulations to get accurate values.

Best, Sebastian

Dear Sir,

I apologize for the mistake in my previous message. Actually, I mean that the triplet and singlet populations become equal.

Let me once explain my problem: When I use the current settings values to diagnose each trajectory,

only 83 trajectories are used for analysis out of 163 trajectories, and I get 30000 fs as the time when the triplet and singlet populations become equal. When I change the current settings values as below,

I get 22000 fs as the time when the triplet and singlet populations become equal.

From your previous message, I understand that I will get better results when more trajectories are included in the analysis. Is this correct? But sir, I don't understand how to set the etot_window, etot_step, epot_step, ekin_step, and hop_energy values to get better trajectory analysis. In the above case, I followed the tutorial values. I don't know if these values are suitable in my case. Please assist me.

Best regards, LUKHMANUL HAKEEM K.

maisebastian commented 5 months ago

Dear lukhman9020, thank you for your clarification.

Indeed, results get statistically more meaningful ("better") if more trajectories are included, but this is obviously only true if the trajectories do not suffer from any problems. Therefore, the goal should be to identify all valid trajectories and include them in the analysis. However, what is valid and what is invalid depends on the chemical problem you try to understand, the desired accuracy, as well as your simulation parameters. Therefore, there is no automatic way to decide on the best parameters for diagnostics.py.

The typical practice that we employ is to run diagnostics with default settings, and then manually inspect some or all of the excluded trajectories by means of data_extractor.x, make_gnuscript.py, etc. In this way, we try to understand what is the cause for the exclusion of the trajectories. Don't forget, diagnostics.py does not know your chemical problem and cannot think for you. It can only show you potential problems with trajectories, but it should not be the final arbitor of which trajectories to exclude. You should check the trajectories and then decide which trajectories are ok and which are problematic. Then you can manually delete or create DONT_ANALYZE files inside the trajectory folders to manually select which trajectories to include in the analyses.

I also strongly recommend that in your eventual publication you transparently tell the reader about this step. You should write something like "We have created X initial conditions, the excitation process selected Y of them (Y1 in S1, Y2 in S2, ...). We have run a subset of Z of these initial conditions (Z1 in S1, Z2 in S2, ...). Out of the Z trajectories, N were excluded from analysis because of REASON, REASON, REASON. The excluded trajectories are separately discussed below." (one example: https://dx.doi.org/10.1021/acs.jpca.3c02899).

Best, Sebastian

lukhman9020 commented 5 months ago

Dear lukhman9020, thank you for your clarification.

Indeed, results get statistically more meaningful ("better") if more trajectories are included, but this is obviously only true if the trajectories do not suffer from any problems. Therefore, the goal should be to identify all valid trajectories and include them in the analysis. However, what is valid and what is invalid depends on the chemical problem you try to understand, the desired accuracy, as well as your simulation parameters. Therefore, there is no automatic way to decide on the best parameters for diagnostics.py.

The typical practice that we employ is to run diagnostics with default settings, and then manually inspect some or all of the excluded trajectories by means of data_extractor.x, make_gnuscript.py, etc. In this way, we try to understand what is the cause for the exclusion of the trajectories. Don't forget, diagnostics.py does not know your chemical problem and cannot think for you. It can only show you potential problems with trajectories, but it should not be the final arbitor of which trajectories to exclude. You should check the trajectories and then decide which trajectories are ok and which are problematic. Then you can manually delete or create DONT_ANALYZE files inside the trajectory folders to manually select which trajectories to include in the analyses.

I also strongly recommend that in your eventual publication you transparently tell the reader about this step. You should write something like "We have created X initial conditions, the excitation process selected Y of them (Y1 in S1, Y2 in S2, ...). We have run a subset of Z of these initial conditions (Z1 in S1, Z2 in S2, ...). Out of the Z trajectories, N were excluded from analysis because of REASON, REASON, REASON. The excluded trajectories are separately discussed below." (one example: https://dx.doi.org/10.1021/acs.jpca.3c02899).

Best, Sebastian

Dear Sir, Thank you for your detailed guidance. I’ll ensure to manually inspect and appropriately report the trajectories in my analysis.

Best regards, LUKHMANUL HAKEEM K.

lukhman9020 commented 4 months ago

Dear lukhman9020, I am not completely sure what you are referring to with "triplet and singlet crossing points" - Do you simply mean the point in time when the singlet and triplet population becomes equal? Normally, "triplet and singlet crossing points" are geometries where the energy of a singlet and a triplet state coincide. Regarding your discrepancy, it seems that with the default thresholds in diagnostics.py, several trajectories were filtered out due to various possible reasons. Best practice in SHARC is to manually inspect the trajectories that diagnostics.py flags. Only then can you understand what causes the trajectories to be rejected. This knowledge allows you to judge whether the trajectories should be included in the analysis nonetheless, whether they should stay excluded, or whether the entire project should be repeated with improved settings. What you write about the "time of singlet-triplet crossing" seems entirely reasonable to me. Excluding part of the trajectory swarm can easily change your results, especially if you only have a small number of trajectories and/or if you are excluding a large fraction of the swarm. Long time constants are especially prone to large errors and require many trajectories and long simulations to get accurate values. Best, Sebastian

Dear Sir,

I apologize for the mistake in my previous message. Actually, I mean that the triplet and singlet populations become equal.

Let me once explain my problem: When I use the current settings values to diagnose each trajectory,

only 83 trajectories are used for analysis out of 163 trajectories, and I get 30000 fs as the time when the triplet and singlet populations become equal. When I change the current settings values as below,

I get 22000 fs as the time when the triplet and singlet populations become equal.

From your previous message, I understand that I will get better results when more trajectories are included in the analysis. Is this correct? But sir, I don't understand how to set the etot_window, etot_step, epot_step, ekin_step, and hop_energy values to get better trajectory analysis. In the above case, I followed the tutorial values. I don't know if these values are suitable in my case. Please assist me.

Best regards, LUKHMANUL HAKEEM K.

Dear Sir, Sorry sir, Could you please explain once more how these values should be chosen? 3 eV values for etot_window, etot_step, epot_step, ekin_step, and hop_energy as mentioned in the tutorial.

Best regards, LUKHMANUL HAKEEM K.

maisebastian commented 4 months ago

The value of 3eV was simply chosen to "turn off all diagnostics" and to accept all trajectories irrespective of whether they have problems or not. This was simply to showcase you that you can get different ensemble results whether you exclude trajectories or not. Ideally, if you exclude trajectories in a random fashion, then the ensemble results should not change qualitatively, but if you exclude trajectories in some systematic fashion (e.g., excluding crashed trajectories, where the crash is related to visiting a certain part of phase space), then you might change the results.

Best, Sebastian

lukhman9020 commented 4 months ago

The value of 3eV was simply chosen to "turn off all diagnostics" and to accept all trajectories irrespective of whether they have problems or not. This was simply to showcase you that you can get different ensemble results whether you exclude trajectories or not. Ideally, if you exclude trajectories in a random fashion, then the ensemble results should not change qualitatively, but if you exclude trajectories in some systematic fashion (e.g., excluding crashed trajectories, where the crash is related to visiting a certain part of phase space), then you might change the results.

Best, Sebastian

Dear Sir,

Initially, I did not encounter any crashed trajectories as I believed that it's due to using the TD-DFT single reference method for parametrizing LVC. However, upon diagnosing the trajectories with the current settings values: etot_window: 0.2 etot_step: 0.1 epot_step: 0.7 ekin_step: 0.7 hop_energy: 1.0

I received messages indicating large steps in Epot and large fluctuations in Etot for some trajectories. This resulted in obtaining 84 trajectories for a 100,000 fs analysis.

When I adjusted the diagnosis settings to the following values:

etot_window: 3 etot_step: 3 epot_step: 3 ekin_step: 3 hop_energy: 3 I did not receive any messages about large steps or fluctuations, resulting in 161 trajectories for the same analysis duration.

My question is: which set of values should I follow to obtain correct population results? Should I manually inspect all trajectories to determine appropriate values for the diagnosis settings, or is it acceptable to use the uniform value of 3 eV for etot_window, etot_step, epot_step, ekin_step, and hop_energy? Please assist me.

Best regards, LUKHMANUL HAKEEM K.

maisebastian commented 4 months ago

We typically start out with standard settings. If there are many warnings (Most often Etot or Ejump), then we inspect several trajectories and determine the source of the warnings. However, for LVC trajectories, the PESs will always be smooth (unlike with ab initio, e.g., CASSCF). So if you get warnings about Epot or Ekin violations, they might just be because you have strong oscillations and energy changes too much within a time step. Usually, such cases are not a reason for concern.

Once you have investigated the reason for the warnings, and made sure that they are not problematic, you can safely increase all thresholds to a large value (e.g., 3eV) to accept all trajectories.

Best, Sebastian

lukhman9020 commented 4 months ago

We typically start out with standard settings. If there are many warnings (Most often Etot or Ejump), then we inspect several trajectories and determine the source of the warnings. However, for LVC trajectories, the PESs will always be smooth (unlike with ab initio, e.g., CASSCF). So if you get warnings about Epot or Ekin violations, they might just be because you have strong oscillations and energy changes too much within a time step. Usually, such cases are not a reason for concern.

Once you have investigated the reason for the warnings, and made sure that they are not problematic, you can safely increase all thresholds to a large value (e.g., 3eV) to accept all trajectories.

Best, Sebastian

Dear Sir,

Ok, Thank you for your explanation.

Best regards, LUKHMANUL HAKEEM K.

lukhman9020 commented 3 months ago

Dear Sir, I hope this email finds you well. I am writing to seek clarification regarding the population plot generated after successfully running the SHARC trajectories. Specifically, I have been using the populations.py script to collect all trajectory data and generate the population plot. However, I am a bit confused about which option I should select to obtain the most accurate and representative plot.

Currently, I have been consistently selecting option 22. Is this the correct choice? My goal is to generate a plot that shows the singlet and triplet populations (S1 and T) in a manner similar to the population plot obtained from femtosecond Transient Absorption Spectroscopy (TAS) experiments.

maisebastian commented 3 months ago

The ideal procedure for comparing to TAS would be to simulate the TAS, as you have been asking about in another issue. If that is not possible, I agree that diabatic populations are the best way to go. However, note that it is not always possible to compute accurate diabatic populations. If you use LVC, then diabatization is well-defined. But with ab initio methods, I would not trust the diabatic populations. In this case, you can have a look at options 5 or 6, which can produce roughly approximate diabatic populations, given certain criteria.

Best, Sebastian

lukhman9020 commented 3 months ago

The ideal procedure for comparing to TAS would be to simulate the TAS, as you have been asking about in another issue. If that is not possible, I agree that diabatic populations are the best way to go. However, note that it is not always possible to compute accurate diabatic populations. If you use LVC, then diabatization is well-defined. But with ab initio methods, I would not trust the diabatic populations. In this case, you can have a look at options 5 or 6, which can produce roughly approximate diabatic populations, given certain criteria.

Best, Sebastian

Dear Sir, I have obtained singlet and triplet population data from femtosecond Transient Absorption Spectroscopy (TAS) experiments. My aim is to generate a population plot using the SHARC-LVC model that approximates the population plot obtained from TAS, rather than simulating TAS directly. To achieve this, I have been using option 22 when running the populations.py script to generate the population plot. However, I am unsure if this is the correct analysis mode for my purpose.

Please assist me.

Best regards, LUKHMANUL HAKEEM K.

maisebastian commented 3 months ago

Option 22 would be the most appropriate option, within the mentioned restrictions.

Note that you have to provide the overlap between t=0 and a reference geometry. This works like this: 1) In setup_init.py, activate the "reference overlaps" option. 2) Run all ICOND folders (ICOND_00000 must be finished before all others are started) 3) In each TRAJ folder, make a symlink: ln -s /path/to/initconds/ICOND_XXXXX Reference, where XXXXX is the same number as in TRAJ_XXXXX 4) Run data_extractor.x. If done correctly, there will be NO warning about "missing reference overlaps". 5) Use option 22 in populations.py

lukhman9020 commented 3 months ago

Option 22 would be the most appropriate option, within the mentioned restrictions.

Note that you have to provide the overlap between t=0 and a reference geometry. This works like this:

In setup_init.py, activate the "reference overlaps" option.

Run all ICOND folders (ICOND_00000 must be finished before all others are started)

In each TRAJ folder, make a symlink: ln -s /path/to/initconds/ICOND_XXXXX Reference, where XXXXX is the same number as in TRAJ_XXXXX

Run data_extractor.x. If done correctly, there will be NO warning about "missing reference overlaps".

Use option 22 in populations.py

Dear Sir, Thank you for your previous reply. I appreciate your guidance, but I am still a bit confused about Step 3 regarding the symlink creation. Specifically, I am not sure where exactly I should place the symlink command: ln -s /path/to/initconds/ICOND_XXXXX Reference. Should this command be included in the run.sh file of each TRAJ folder? Also, should this symlink be created before running the trajectories, or can it be done afterward? In addition, I have another question concerning the figures in your paper titled "Highly Efficient Surface Hopping Dynamics Using a Linear Vibronic Coupling Model." In Figure 4, you refer to the "Time evolution of the diabatic singlet and triplet state populations," whereas in Figure 5, you mention the "Time evolution of the adiabatic singlet state populations." Could you please explain why these two are represented differently in terms of diabatic and adiabatic populations? Furthermore, I am still somewhat confused about the concepts of diabatic and adiabatic excited state populations, as well as the spin-orbit couplings (SOC) of diabatic and adiabatic states. Although I have read some papers on the topic, I have not yet gained a clear understanding. Could you kindly share any papers or books that explain these concepts more clearly? It would be incredibly helpful for my research.

Best regards, LUKHMANUL HAKEEM K.

maisebastian commented 3 months ago

The link can be placed after the trajectory has been run. The link should be like this

ls -l TRAJ_00123/
...
Reference -> /path/to/initconds/ICOND_00123/

This is because you run data_extractor.x inside TRAJ_XXXXX. If the data_extractor.x sees a file called Reference/QM.out, then it will be loaded with the reference overlaps, as is needed to get correct diabatic populations.

In the 2019 paper you mentioned, Figure 4 used option 20 or 21, and Figure 5 used option 3 or 9 (I would need to check). The reason is simply that for SO2, there is a large body of literature referring to the diabatic states, and these states are also clearly defined by their symmetry. The states are also clearly separated from all higher-energy states, so that diabatization is feasible, even in MRCI. In contrast, 2AP and Ade are bigger molecules, where there are no such well-defined states (like in the majority of cases), so we use adiabatic states.

Best, Sebastian

lukhman9020 commented 3 months ago

The link can be placed after the trajectory has been run. The link should be like this
ls -l TRAJ_00123/
...
Reference -> /path/to/initconds/ICOND_00123/
This is because you run data_extractor.x inside TRAJ_XXXXX. If the data_extractor.x sees a file called Reference/QM.out, then it will be loaded with the reference overlaps, as is needed to get correct diabatic populations.

In the 2019 paper you mentioned, Figure 4 used option 20 or 21, and Figure 5 used option 3 or 9 (I would need to check). The reason is simply that for SO2, there is a large body of literature referring to the diabatic states, and these states are also clearly defined by their symmetry. The states are also clearly separated from all higher-energy states, so that diabatization is feasible, even in MRCI. In contrast, 2AP and Ade are bigger molecules, where there are no such well-defined states (like in the majority of cases), so we use adiabatic states.

Best, Sebastian

Dear Sir, Thank you for your explanation.

Best regards, LUKHMANUL HAKEEM K.