Closed turbach closed 4 years ago
Here is what Option 1 might look like for fitgrid 0.4.8 in a mkconda environment which includes pinned python 3.6, mkpy 0.1.6, numpy 1.16.4, fitgrid 0.4.8 and its supporting r-* friends, plus jupyter, r-tidyverse, and r-studio and a few other goodies.
np.isclose(actual, expected, atol=FIT_ATOL, rtol=FIT_RTOL)
comparisons(/tmp/mkconda/run_env) [turbach@mkgpu1 fitgrid]$ conda list "(r-lme|r-matrix|numpy|pandas|mkl|.*blas.*|^python\b|fitgrid|pymer|mkpy)"
# packages in environment at /tmp/mkconda/run_env:
#
# Name Version Build Channel
blas 1.0 mkl
fitgrid 0.4.8 0_g728d17b_0 kutaslab
mkl 2019.4 243
mkl-service 2.3.0 py36he904b0f_0
mkl_fft 1.0.15 py36ha843d7b_0
mkl_random 1.1.0 py36hd6b4f25_0
mkpy 0.1.6 0_g0d5cce5 kutaslab
numpy 1.16.4 py36h7e9f1db_0
numpy-base 1.16.4 py36hde5b4d6_0
pandas 0.25.3 py36he6710b0_0
pymer4 0.6.0 py36_0 kutaslab
python 3.6.9 h265db76_0
python-dateutil 2.8.1 py_0
r-lme4 1.1_17 r351h29659fb_0
r-lmertest 3.0_1 r351h6115d3f_0
r-matrix 1.2_14 r351h96ca727_0
tests/test_utils_summary.py::test_summarize
========================================================== test session starts ===========================================================
platform linux -- Python 3.6.9, pytest-5.3.2, py-1.8.0, pluggy-0.13.1
rootdir: /mnt/cube/home/turbach/TPU_Projects/fitgrid
collected 1 item
tests/test_utils_summary.py . [100%]
============================================================ warnings summary ============================================================
tests/test_utils_summary.py::test_summarize
/tmp/mkconda/run_env/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
return f(*args, **kwds)
tests/test_utils_summary.py::test_summarize
/mnt/cube/home/turbach/TPU_Projects/fitgrid/tests/test_utils_summary.py:164: UserWarning: lmer has_warning values have changed
warnings.warn(f'{modler} has_warning values have changed')
tests/test_utils_summary.py::test_summarize
/mnt/cube/home/turbach/TPU_Projects/fitgrid/tests/test_utils_summary.py:196: UserWarning:
------------------------------------------------------------
fitted vals out of tolerance: 0.001 + 0.001 * expected
lmer 1 + (continuous | categorical) (Intercept) Estimate
[[ True True]
[False True]]
channel0 channel1
val expected fitted expected fitted
Time model beta key
0 1 + (continuous | categorical) (Intercept) Estimate 14.274358 14.274375 -2.765258 -2.765064
1 1 + (continuous | categorical) (Intercept) Estimate -0.248746 -0.252334 -8.092379 -8.092379
------------------------------------------------------------
warnings.warn(msg)
tests/test_utils_summary.py::test_summarize
/mnt/cube/home/turbach/TPU_Projects/fitgrid/tests/test_utils_summary.py:196: UserWarning:
------------------------------------------------------------
fitted vals out of tolerance: 0.001 + 0.001 * expected
lmer 1 + continuous + (continuous | categorical) (Intercept) DF
[[ True True]
[False True]]
channel0 channel1
val expected fitted expected fitted
Time model beta key
0 1 + continuous + (continuous | categorical) (Intercept) DF 0.959485 0.959447 0.962329 0.962947
1 1 + continuous + (continuous | categorical) (Intercept) DF 3.125167 3.130235 3.999987 4.000000
------------------------------------------------------------
warnings.warn(msg)
-- Docs: https://docs.pytest.org/en/latest/warnings.html
===================================================== 1 passed, 4 warnings in 56.03s =====================================================
(/tmp/mkconda/run_env) [turbach@mkgpu1 fitgrid]$
It seems that current versions of r-lme4 and r-matrix on the anaconda default channel return slightly different values than versions a few releases back.
This makes fitgrid 0.4.8 fail on tests that check LMER run-time return values against hard-coded expected values in
tests/test_utils_lmer.py
andtests/test_utils_summary.py
(examples below).Pinning minimum r-lme4 and r-matrix versions is not an option b.c., conda environment dependency resolution for other packages (as in mkconda) may block the latest releases.
Options
Option 1 (easy): Allow slight discrepancies in LMER return values via rtol in np.allclose(.., rtol=FIT_RTOL). Pilot testing indicates that a relative tolerance of 2 tenths of one percent np.allclose(actual, expected, atol=0, rtol=0.002) allows the tests to pass. This is adequate to catch gross changes in what LMER returns and the differences are small enough to compare modeling results across different LMER releases.
Option 2 (too much work): Keep the correctness tests exact and maintain a map of the test: results for each release.
Option 3 (compromise): Keep the correctness tests exact for the latest anaconda LMER release available at the time of the fitgrid release. Fall back to a relative tolerance test with a warning if the exact test fails, and fail if the relative tolerance test fails.
Example 1
fitgrid 0.4.8 correctness test passes with
r-lme4 1.1_21
andr-matrix 1.2_17
Example 2
fitgrid 0.4.8 correctness test fails with previous
r-lme4 1.1_17
andr-matrix 1.2_14
Cleaning up the pytest output shows the actual vs expected discrepancies in the 4th decimal place: