How to choose the chemical potential range for the MC simulations

darjaved commented 7 months ago

I have my casm learn input file with some compositions, How to choose the chemical potential range to include all the compositions? is it necessary it should include all my compositions exactly? Why my results.json file compositions like :

"<atom_frac(Sn)>" : [ 0.254463252315, 0.333333333333, 0.333333333333, 0.333333333333, 0.488210556082, 0.520053810807, 0.666663652585, 0.666664382310, 0.666664358008, 0.666665905214, 0.666666666667 ] There are some compositions which are very close to each other, and shouldn't it be as per my casm learn input file only?

Also, in my casm learn input some configurations have zero weight (i choose), should i remove them completely? will they a part of the MC simulations?

{ { "comment" : "Built from example", "debug" : false, "ensemble" : "grand_canonical", "method" : "metropolis", "model" : { "formation_energy" : "formation_energy" }, "supercell" : [ [24, 0, 0], [0, 24, 0], [0, 0, 24] ], "data" : { "sample_by" : "pass", "sample_period" : 1, "min_pass" : 100, "max_pass" : 100, "confidence" : 0.95, "measurements" : [ { "quantity" : "formation_energy", "precision" : 1e-3 }, { "quantity" : "potential_energy", "precision" : 1e-3 }, { "quantity" : "clex_hull_dist(casm_learn_input,comp)", "precision" : 1e-3 }, { "quantity" : "atom_frac" }, { "quantity" : "site_frac" }, { "quantity" : "comp", "precision" : 1e-3 }, { "quantity" : "comp_n" } ], "storage" : { "write_observations" : false, "write_trajectory" : false, "output_format" : ["csv", "json"] } }, "driver" : { "dependent_runs": false, "mode" : "incremental", "motif" : { "configname" : "auto" }, "initial_conditions" : { "param_chem_pot" : { "a" : -0.6, "b" : 0 }, "temperature" : 5, "tolerance" : 0.001 }, "final_conditions" : { "param_chem_pot" : { "a" : 0.7, "b" : 0 }, "temperature" : 5, "tolerance" : 0.001 }, "incremental_conditions" : { "param_chem_pot" : { "a" : 0.1, "b" : 0 }, "temperature" : 0, "tolerance" : 0.001 } } }

darjaved commented 7 months ago

Also, during the weight optimisation for obtaining the best fit with LASSO, what can be the maximum value of the weight? any guidelines?

darjaved commented 7 months ago

I have my casm learn input file with some compositions, How to choose the chemical potential range to include all the compositions? is it necessary it should include all my compositions exactly? Why my results.json file compositions like :

"<atom_frac(Sn)>" : [ 0.254463252315, 0.333333333333, 0.333333333333, 0.333333333333, 0.488210556082, 0.520053810807, 0.666663652585, 0.666664382310, 0.666664358008, 0.666665905214, 0.666666666667 ] There are some compositions which are very close to each other, and shouldn't it be as per my casm learn input file only?

Also, in my casm learn input some configurations have zero weight (i choose), should i remove them completely? will they a part of the MC simulations?

{ { "comment" : "Built from example", "debug" : false, "ensemble" : "grand_canonical", "method" : "metropolis", "model" : { "formation_energy" : "formation_energy" }, "supercell" : [ [24, 0, 0], [0, 24, 0], [0, 0, 24] ], "data" : { "sample_by" : "pass", "sample_period" : 1, "min_pass" : 100, "max_pass" : 100, "confidence" : 0.95, "measurements" : [ { "quantity" : "formation_energy", "precision" : 1e-3 }, { "quantity" : "potential_energy", "precision" : 1e-3 }, { "quantity" : "clex_hull_dist(casm_learn_input,comp)", "precision" : 1e-3 }, { "quantity" : "atom_frac" }, { "quantity" : "site_frac" }, { "quantity" : "comp", "precision" : 1e-3 }, { "quantity" : "comp_n" } ], "storage" : { "write_observations" : false, "write_trajectory" : false, "output_format" : ["csv", "json"] } }, "driver" : { "dependent_runs": false, "mode" : "incremental", "motif" : { "configname" : "auto" }, "initial_conditions" : { "param_chem_pot" : { "a" : -0.6, "b" : 0 }, "temperature" : 5, "tolerance" : 0.001 }, "final_conditions" : { "param_chem_pot" : { "a" : 0.7, "b" : 0 }, "temperature" : 5, "tolerance" : 0.001 }, "incremental_conditions" : { "param_chem_pot" : { "a" : 0.1, "b" : 0 }, "temperature" : 0, "tolerance" : 0.001 } } }

{ "<atom_frac(Na)>" : [ 1.000000000000, 0.750016693376, 0.744715418544, 0.666666666667, 0.666666666667, 0.666666666667, 0.508919461196, 0.477095170455, 0.333335617690, 0.333334094786, 0.333340115017, 0.333334094786, 0.333332571881, 0.000000000000, 0.000000000000 ], "<atom_frac(Sn)>" : [ 0.000000000000, 0.249983306624, 0.255284581456, 0.333333333333, 0.333333333333, 0.333333333333, 0.491080538804, 0.522904829545, 0.666664382310, 0.666665905214, 0.666659884983, 0.666665905214, 0.666667428119, 1.000000000000, 1.000000000000 ], "<clex_hull_dist(casm_learn_input,comp)>" : [ 0.000000000000, -0.178039938030, -0.121295321820, 0.000000000000, 0.000000000000, 0.000000000000, -0.042914768225, -0.038028003429, -0.065069651294, -0.065070827050, -0.065066179139, -0.065069138361, -0.065069469773, 0.000000000000, 0.000000000000 ], "<comp(a)>" : [ 0.000000000000, 0.249983306624, 0.255284581456, 0.333333333333, 0.333333333333, 0.333333333333, 0.491080538804, 0.522904829545, 0.666664382310, 0.666665905214, 0.666659884983, 0.666665905214, 0.666667428119, 1.000000000000, 1.000000000000 ], "<comp_n(Na)>" : [ 1.000000000000, 0.750016693376, 0.744715418543, 0.666666666667, 0.666666666667, 0.666666666667, 0.508919461196, 0.477095170455, 0.333335617690, 0.333334094786, 0.333340115017, 0.333334094786, 0.333332571881, 0.000000000000, 0.000000000000 ], "<comp_n(Sn)>" : [ 0.000000000000, 0.249983306624, 0.255284581456, 0.333333333333, 0.333333333333, 0.333333333333, 0.491080538804, 0.522904829545, 0.666664382310, 0.666665905214, 0.666659884983, 0.666665905214, 0.666667428119, 1.000000000000, 1.000000000000 ], "" : [ -0.001507260663, -0.330447594354, -0.276425692132, -0.195193703029, -0.195193703029, -0.195193703029, -0.262256773203, -0.261613218586, -0.252203646786, -0.252204017631, -0.252202551636, -0.252202328942, -0.252201855442, -0.000409821000, -0.000409821000 ], "" : [ -0.001507260663, -0.180457610380, -0.148783401404, -0.061860369696, -0.095193703029, -0.128527036363, -0.213148719323, -0.261613218586, -0.318870085017, -0.385537198673, -0.452200517131, -0.518868691028, -0.585535569501, -0.600409821000, -0.700409821000 ], "<site_frac(Na)>" : [ 1.000000000000, 0.750016693376, 0.744715418543, 0.666666666667, 0.666666666667, 0.666666666667, 0.508919461196, 0.477095170455, 0.333335617690, 0.333334094786, 0.333340115017, 0.333334094786, 0.333332571881, 0.000000000000, 0.000000000000 ],

Can you please check these, if there is any problem

darjaved commented 7 months ago

"is_converged" : [ true, true, true, true, true, true, true, true, true, true, true, true, true ], "is_equilibrated" : [ true, true, true, true, true, true, true, true, true, true, true, true, true ], "param_chem_pot(a)" : [ -0.600000000000, -0.500000000000, -0.400000000000, -0.300000000000, -0.200000000000, -0.100000000000, -0.000000000000, 0.100000000000, 0.200000000000, 0.300000000000, 0.400000000000, 0.500000000000, 0.600000000000 ], "prec(<atom_frac(Na)>)" : [ 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000, 0.000158634892, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000 ], "prec(<atom_frac(Sn)>)" : [ 0.000000000000, 0.000057317051, 0.000000000000, 0.000000000000, 0.000000000000, 0.000158634892, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000 ], "prec(<clex_hull_dist(casm_learn_input,comp)>)" : [ 0.000072094396, 0.000052275365, 0.000000000000, 0.000000000000, 0.000000000000, 0.000047261045, 0.000025094315, 0.000016087867, 0.000000000000, 0.000000000000, 0.000000000000, 0.000016433246, 0.000000000000 ], "prec(<comp(a)>)" : [ 0.000000000000, 0.000057317051, 0.000000000000, 0.000000000000, 0.000000000000, 0.000158634892, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000 ], "prec(<comp_n(Na)>)" : [ 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000, 0.000158634892, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000 ], "prec(<comp_n(Sn)>)" : [ 0.000000000000, 0.000057317051, 0.000000000000, 0.000000000000, 0.000000000000, 0.000158634892, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000 ], "prec()" : [ 0.000000000000, 0.000082999424, 0.000000000000, 0.000000000000, 0.000000000000, 0.000069281800, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000 ], "prec()" : [ 0.000076637017, 0.000052670934, 0.000000000000, 0.000000000000, 0.000000000000, 0.000052108938, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000 ], "prec(<site_frac(Na)>)" : [ 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000, 0.000158634892, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000, 0.000000000000 ],

Can you please explain what are these tags actually? like prec(<clex_hull_dist(casm_learn_input,comp)

darjaved commented 7 months ago

SCEL8_8_1_1_0_2_7/24 0.5 -0.463914 -0.392215 True 40.0 71.69897 SCEL8_4_2_1_1_0_1/32 0.5 -0.464785 -0.424896 True 24.0 39.88902 SCEL8_8_1_1_0_3_6/12 0.5 -0.467476 -0.407082 True 40.0 60.39397 SCEL8_2_4_1_1_0_0/36 0.5 -0.468075 -0.430692 True 20.0 37.38247 SCEL8_8_1_1_0_5_1/54 0.5 -0.469709 -0.437471 True 22.0 32.23754 SCEL8_8_1_1_0_5_1/52 0.5 -0.476909 -0.436568 True 50.0 40.34080 SCEL8_4_2_1_1_0_1/36 0.5 -0.485797 -0.428869 True 100.0 56.92849 min E_DFT -0.48579745 at SCEL8_4_2_1_1_0_1/36 weight 100.0 min E_CX -0.44363867 at SCEL8_4_2_1_1_0_1/28 weight 40.0 max E_DFT -0.3061359 at SCEL8_2_2_2_0_0_0/35 weight 20.0 max E_CX -0.34110944 at SCEL8_2_2_2_0_1_1/12 weight 24.0

even after using high weights i am not getting the correct ground state. any solution?

darjaved commented 6 months ago

@xivh Could you please guide me here

xivh commented 6 months ago

How to choose the chemical potential range to include all the compositions? is it necessary it should include all my compositions exactly?

Are you asking about the fitting or the monte carlo

Also, in my casm learn input some configurations have zero weight (i choose), should i remove them completely? will they a part of the MC simulations? You can ignore configurations in the fitting, but you should make sure that they are not predicted as ground states then. You can't exclude configurations from the monte carlo simulation, but they will be sampled infrequently/never if they are high in energy.

Also, during the weight optimisation for obtaining the best fit with LASSO, what can be the maximum value of the weight? any guidelines?

I usually do CV with something like 1e-5 to 1e-1. What is more important is that the ECI look good (not overfitting). If the $\lambda$ you get after CV is the same as the max weight that you tried, then you should increase the range.

Can you please explain what are these tags actually? like prec(<clex_hull_dist(casm_learn_input,comp)

Maybe this issue will help you? #67

even after using high weights i am not getting the correct ground state. any solution?

I have had success augmenting my data with these hull distance correlations: https://github.com/Van-der-Ven-Group/thermocore/blob/53daacf16e7fe36a62d0d47f7c4f0cc571696f5d/thermocore/geometry/hull.py#L310

You will have to fit outside of casm learn, though.

darjaved commented 6 months ago

How to choose the chemical potential range to include all the compositions? is it necessary it should include all my compositions exactly?

Are you asking about the fitting or the monte carlo

Also, in my casm learn input some configurations have zero weight (i choose), should i remove them completely? will they a part of the MC simulations? You can ignore configurations in the fitting, but you should make sure that they are not predicted as ground states then. You can't exclude configurations from the monte carlo simulation, but they will be sampled infrequently/never if they are high in energy.

Also, during the weight optimisation for obtaining the best fit with LASSO, what can be the maximum value of the weight? any guidelines?

I usually do CV with something like 1e-5 to 1e-1. What is more important is that the ECI look good (not overfitting). If the λ you get after CV is the same as the max weight that you tried, then you should increase the range.

Can you please explain what are these tags actually? like prec(<clex_hull_dist(casm_learn_input,comp)

Maybe this issue will help you? #67

even after using high weights i am not getting the correct ground state. any solution?

I have had success augmenting my data with these hull distance correlations: https://github.com/Van-der-Ven-Group/thermocore/blob/53daacf16e7fe36a62d0d47f7c4f0cc571696f5d/thermocore/geometry/hull.py#L310

You will have to fit outside of casm learn, though.

darjaved commented 6 months ago

How to choose the chemical potential range to include all the compositions? is it necessary it should include all my compositions exactly?

Are you asking about the fitting or the monte carlo

Also, in my casm learn input some configurations have zero weight (i choose), should i remove them completely? will they a part of the MC simulations? You can ignore configurations in the fitting, but you should make sure that they are not predicted as ground states then. You can't exclude configurations from the monte carlo simulation, but they will be sampled infrequently/never if they are high in energy.

Also, during the weight optimisation for obtaining the best fit with LASSO, what can be the maximum value of the weight? any guidelines?

I usually do CV with something like 1e-5 to 1e-1. What is more important is that the ECI look good (not overfitting). If the λ you get after CV is the same as the max weight that you tried, then you should increase the range.

Can you please explain what are these tags actually? like prec(<clex_hull_dist(casm_learn_input,comp)

Maybe this issue will help you? #67

even after using high weights i am not getting the correct ground state. any solution?

I have had success augmenting my data with these hull distance correlations: https://github.com/Van-der-Ven-Group/thermocore/blob/53daacf16e7fe36a62d0d47f7c4f0cc571696f5d/thermocore/geometry/hull.py#L310 You will have to fit outside of casm learn, though.

I am asking about monte carlo.

xivh commented 6 months ago

If you plot formation energy per prim vs the parametric composition axis, the maximum/minimum slope are starting points for your chemical potential boundaries. If you are integrating across chemical potential at fixed temperature, you will want to select a chemical potential which is large enough that you have a pure compound at your starting point. Here is a reference about the Monte Carlo in CASM:

https://arxiv.org/abs/2309.11761

prisms-center / CASMcode

How to choose the chemical potential range for the MC simulations #355