Closed ltalirz closed 1 year ago
I assume the problem are the asserted results, the tolerances are very low especially for the reduced accuracy. Strangely it always succeeds locally for me. But I will make a new PR and test with less strict tolerances
as spotted by @chrisjsewell: the test fails due to rounding errors in the xyz file that gets created https://github.com/aiidateam/aiida-cp2k/blob/7cf6d7645cce64af113225434a89eb681b27bc58/aiida_cp2k/calculations/__init__.py#L386
Oh, I see.
The issue is that the machine epsilon of 64bit floating point numbers is e=2.2e-16
, i.e. printing floating point numbers to 16 digits (as this function does) enters into territory outside the precision to which the numbers are actually represented in the computer.
You basically have two choices here of how to address the problem: A) eliminate the randomness B) reduce the precision to which the atomic positions are printed
B) is the easier route - printing the numbers only to, say, 10 digits would probably solve the issue immediately.
It probably has negligible effect on calculations in practice, but it would mean that there will be a slight difference in the atomic positions between a structure read from cp2k output, and a new input structure written by AiiDA.
As for the randomness, there are in principle two possible sources:
atoms.get_positions()
My guess would be it is the first. I'm not sure it will be straightforward to fix, as it might depend on details like whether numpy is using the MKL or another BLAS library for certain transformations of the cell, etc.
actually the differences are really big (see attached) can this be caused just by the rounding? And if yes, reducing the precision to 10 digits would not solve it right?
actually the differences are really big (see attached) can this be caused just by the rounding? And if yes, reducing the precision to 10 digits would not solve it right?
what are the two screenshots you are showing? How did you create them?
actually the differences are really big (see attached) can this be caused just by the rounding? And if yes, reducing the precision to 10 digits would not solve it right?
what are the two screenshots you are showing? How did you create them?
@chrisjsewell ?
actually the differences are really big (see attached) can this be caused just by the rounding? And if yes, reducing the precision to 10 digits would not solve it right?
If these two XYZ files were produced by calling get_positions
on the same StructureData
, then there is indeed a bug, not a rounding error.
If these two XYZ files were produced by calling get_positions on the same StructureData, then there is indeed a bug, not a rounding error.
Note there are two different things possibly at play here:
the top (Zn
) diff could indeed just be a rounding error difference when creating the string {x:25.16f}
the bottom (COO
) diff is to do with how the CO2
molecule is "attached" to the ZnMOF74
within the workchain; so thats more broadly related to the workchain and its python code e.g. the aiida_structure_merge
calcfunction, version of ASE etc, i.e. potentially "compounded" rounding errors
what are the two screenshots you are showing? How did you create them?
From #103; run locally and looking at CI output
I think we can close this now @ltalirz ?
Sure!
The only thing I'd note, is that #108 is a very "dumb" fix, specific to the one failing test (just mapping its input hash to another one). If you add other tests, that include this "merging" of a molecule onto a MOF, that will probably fail as well. So just bear that in mind.
As noted above, the differences in the atomic position of the molecule, for CI vs local, are relatively big, so if you do have at some point, it might be good to look into why that is
oops thats me, using the wrong account lol
great, thanks for the heads-up @chrisjsewell
Looking at the recent CI builds, it appears that at least one of the tests can fail at random (out of the last 4 builds,
run_binding_energy_co2_mof74
failed twice and succeeded twice without any changes that should be relevant to it).A quick look at the test does not tell me what exactly the problem is; this would require local debugging
Error message on CI: