inconsistent results between Mac and Linux

lsmo-epfl / zeopp-lsmo

zeo++ fork of the LSMO

http://www.zeoplusplus.org

Other

9 stars 1 forks source link

inconsistent results between Mac and Linux #18

Open kjappelbaum opened 1 year ago

kjappelbaum commented 1 year ago

as far as i understand this one of the reasons the test suite for the conda recipe is failing for OSX (https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=628582&view=logs&j=58ac6aab-c4bc-5de2-2894-98e408cc8ec9&t=933f325c-924e-533d-4d95-e93b5843ce8b)

For some reason, for instance, the ASA / A^2 in the reference is 60.7713 but on Mac 59.1876

Not really sure where the difference comes from, however, we should already be a bit saner if we have a test suite that does not depend on rounding. Perhaps we should just have some simple parsers and then use pytest.approx

kjappelbaum commented 1 year ago

I can reproduce the error in the test suite when I manually compile on an m1 mac

>       assert sa == sa_ref
E       AssertionError: assert {'ASA_A^2': 5...^2/cm^3', ...} == {'ASA_A^2': 6...^2/cm^3', ...}
E         Omitting 20 identical items, use -vv to show
E         Differing items:
E         {'ASA_m^2/g': 1186.46} != {'ASA_m^2/g': 1218.21}
E         {'ASA_m^2/cm^3': 1924.9} != {'ASA_m^2/cm^3': 1976.4}
E         {'Channel_surface_area_A^2': 59.1876} != {'Channel_surface_area_A^2': 60.7713}
E         {'ASA_A^2': 59.1876} != {'ASA_A^2': 60.7713}
E         Use -v to get more diff

the difference is more than 2%, hard to ignore it.

kjappelbaum commented 1 year ago

and volpo is even worse

        for key in volpo:
>           assert volpo[key] == pytest.approx(volpo_ref[key], rel=0.05)
E           assert 23.787 == 22.6493 ± 1.1e+00
E             comparison failed
E             Obtained: 23.787
E             Expected: 22.6493 ± 1.1e+00

kjappelbaum commented 1 year ago

it is, however, really hard for me to understand the problem. The only clue I have is to perhaps try with different versions of Eigen

kjappelbaum commented 1 year ago

also the PSD bin counts are off

ltalirz commented 1 year ago

Do the algorithms involve any kind of (pseudo-)randomness?

If the seed / random number generator is different on the two platforms, that could be another reason for the discrepancy. In that case, one should allow for an appropriate uncertainty in the test result (which is now trivial to add as you migrated to pytest).

kjappelbaum commented 1 year ago

Thanks! Didn't check this so far. But it is certainly important to also note this in the docs. Currently, I use pytest.approx with tolerances that pass on my Mac ... But the discrepancy >5% seemed a bit large to me

ltalirz commented 1 year ago

Does the discrepancy reduce when you increase the number of samples?

kjappelbaum commented 1 year ago

good point, and thanks for the feedback and ideas.

Didn't test systematically so far. At least for the PSD I didn't get it to match with 5000000 samples (but I'm sure that this algorithm involves randomness and one should evaluate it more carefully, also making plots). For the SA, things seem to improve with more samples. This is also the only thing that is now failing in the conda recipe.

ltalirz commented 1 year ago

For PSD, it might be worth checking that it is not related to https://github.com/lsmo-epfl/zeopp-lsmo/issues/5

zeo++/psd.cc:822:15: error: Uninitialized variable: volFraction [uninitvar]
          if (volFraction >= TOL) {

Edit: Ok, this looks like a false alarm https://github.com/lsmo-epfl/zeopp-lsmo/blob/d641dbf8867b07745c5efaa39dffdbcb8cf19b40/zeo%2B%2B/psd.cc#L814-L821