Closed skk74 closed 2 years ago
This is definitely concerning. I think we should avoid storing any normalized property values in the database and instead perform normalization on an as-needed basis.
Currently, the normalized DFT energy is stored in a field specified by "relaxed_energy"
. I propose that we store the un-normalized energy instead, calling it "total_energy"
, while retaining parsing support for databases that contain a "relaxed_energy"
field.
I am raising this issue because it's a little rare and often overlooked but luckily @farnazcs was able to provide some hard evidence. In the case of a project that allows vacancies, certain calculations can have relaxations that result in one configuration mapping onto another configuration of a different volume. for example:
1 structure has mapped onto Configuration SCEL13_13_1_1_0_9_9/7, which already existed in your project
Because no calculation data exists for configuration SCEL13_13_1_1_0_9_9/7, it will acquire the data from "/home/fkaboudvand/work/binary_Li_Sn/NOVdW_primitiveBCC/training_data/SCEL11_11_1_1_0_10_3/8/calctype.default/properties.calc.json"
If this configuration of a different volume has not been calculated, then the data from the original calculation is stored in the uncalculated configuration as its own data. This causes a problem because the parsed data (i.e. the data that is stored in config_list.json) is already normalized by the supercell volume. You can see evidence in this in the update report. Note that the relaxed energy is listed as -2.43182 this is calculated in the Configuration::read_calc_properties() routine. When the new configuration (SCEL13_13.... in this example) is just copying the data over to itself (internally in config_list.json) it doesn't change the value stored despite the change in supercell volume from 11 to 13. Therefore, all calculations based on the relaxed energy of this SCEL13... configuration are now incorrect due to the improper renormalization! This will alter all formation energy plots, fits, and essentially everything else casm is meant to operate with if this SCEL13 configuration is used. Typically this isn't a problem due to some unofficial user protocol. This is often avoided because when users in the avdv group see that a configuration has relaxed to another configuration that has not yet been calculated, we calculate it in order to get the "native" starting configuration's data. (This is also because any correlations will match this configuration better most likely). Once the SCEL13.. calculation is run from scratch, as long as it doesn't relax to another configuration, the improperly normalized data is removed and the casm project has returned to normalcy.