fitgrid 0.4.8 LMER exact correctness tests fail on previous r-lme4, r-matrix releases

It seems that current versions of r-lme4 and r-matrix on the anaconda default channel return slightly different values than versions a few releases back.

This makes fitgrid 0.4.8 fail on tests that check LMER run-time return values against hard-coded expected values in tests/test_utils_lmer.py and tests/test_utils_summary.py (examples below).

Pinning minimum r-lme4 and r-matrix versions is not an option b.c., conda environment dependency resolution for other packages (as in mkconda) may block the latest releases.

Options

Option 1 (easy): Allow slight discrepancies in LMER return values via rtol in np.allclose(.., rtol=FIT_RTOL). Pilot testing indicates that a relative tolerance of 2 tenths of one percent np.allclose(actual, expected, atol=0, rtol=0.002) allows the tests to pass. This is adequate to catch gross changes in what LMER returns and the differences are small enough to compare modeling results across different LMER releases.
Option 2 (too much work): Keep the correctness tests exact and maintain a map of the test: results for each release.
Option 3 (compromise): Keep the correctness tests exact for the latest anaconda LMER release available at the time of the fitgrid release. Fall back to a relative tolerance test with a warning if the exact test fails, and fail if the relative tolerance test fails.

Example 1

fitgrid 0.4.8 correctness test passes with r-lme4 1.1_21 and r-matrix 1.2_17

(fitgrid_048_mklblas) [turbach@mkgpu1 fitgrid]$ conda list "(r-lme|r-matrix|numpy|pandas|mkl|blas|^python|fitgrid)"
# packages in environment at /home/turbach/.conda/envs/fitgrid_048_mklblas:
#
# Name                    Version                   Build  Channel
blas                      1.0                         mkl  
fitgrid                   0.4.8              0_g728d17b_0    kutaslab
mkl                       2019.4                      243  
mkl-service               2.3.0            py36he904b0f_0  
mkl_fft                   1.0.15           py36ha843d7b_0  
mkl_random                1.1.0            py36hd6b4f25_0  
numpy                     1.16.4           py36h7e9f1db_0  
numpy-base                1.16.4           py36hde5b4d6_0  
pandas                    0.25.3           py36he6710b0_0  
python                    3.6.9                h265db76_0  
python-dateutil           2.8.1                      py_0  
r-lme4                    1.1_21            r36h29659fb_0  
r-lmertest                3.1_0             r36h6115d3f_0  
r-matrix                  1.2_17            r36h96ca727_0  
(fitgrid_048_mklblas) [turbach@mkgpu1 fitgrid]$ pytest tests/test_utils_lmer.py 
=========================================================== test session starts ============================================================
platform linux -- Python 3.6.9, pytest-5.3.2, py-1.8.0, pluggy-0.13.1
rootdir: /mnt/cube/home/turbach/TPU_Projects/fitgrid
collected 1 item                                                                                                                           

tests/test_utils_lmer.py .                                                                                                           [100%]

============================================================= warnings summary =============================================================
tests/test_utils_lmer.py::test_get_lmer_dfbetas
  /home/turbach/.conda/envs/fitgrid_048_mklblas/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
    return f(*args, **kwds)

tests/test_utils_lmer.py::test_get_lmer_dfbetas
  /home/turbach/.conda/envs/fitgrid_048_mklblas/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: boundary (singular) fit: see ?isSingular

    warnings.warn(x, RRuntimeWarning)

tests/test_utils_lmer.py::test_get_lmer_dfbetas
  /home/turbach/.conda/envs/fitgrid_048_mklblas/lib/python3.6/site-packages/rpy2/robjects/pandas2ri.py:191: FutureWarning: from_items is deprecated. Please use DataFrame.from_dict(dict(items), ...) instead. DataFrame.from_dict(OrderedDict(items)) may be used to preserve the key order.
    res = PandasDataFrame.from_items(items)

-- Docs: https://docs.pytest.org/en/latest/warnings.html
====================================================== 1 passed, 3 warnings in 8.65s =======================================================
(fitgrid_048_mklblas) [turbach@mkgpu1 fitgrid]$

Example 2

fitgrid 0.4.8 correctness test fails with previous r-lme4 1.1_17 and r-matrix 1.2_14

Cleaning up the pytest output shows the actual vs expected discrepancies in the 4th decimal place:

             cat0     cat1      cat2      cat3      cat4
(Intercept)  0.261817 -0.27545  0.049361  0.010295  0.096300
continuous  -0.009800  0.04177 -0.257149  0.201921 -0.196818

             cat0     cat1      cat2      cat3      cat4
(Intercept)  0.261817 -0.27545  0.049310  0.010295  0.096300
continuous  -0.009799  0.04177 -0.257081  0.201921 -0.196818

(/tmp/mkconda/run_env) [turbach@mkgpu1 fitgrid]$ conda list "(r-lme|r-matrix|numpy|pandas|mkl|blas|^python|fitgrid)"
# packages in environment at /tmp/mkconda/run_env:
#
# Name                    Version                   Build  Channel
blas                      1.0                         mkl  
fitgrid                   0.4.8              0_g728d17b_0    kutaslab
mkl                       2019.4                      243  
mkl-service               2.3.0            py36he904b0f_0  
mkl_fft                   1.0.15           py36ha843d7b_0  
mkl_random                1.1.0            py36hd6b4f25_0  
numpy                     1.16.4           py36h7e9f1db_0  
numpy-base                1.16.4           py36hde5b4d6_0  
pandas                    0.25.3           py36he6710b0_0  
python                    3.6.9                h265db76_0
python-dateutil           2.8.1                      py_0    
r-lme4                    1.1_17           r351h29659fb_0  
r-lmertest                3.0_1            r351h6115d3f_0  
r-matrix                  1.2_14           r351h96ca727_0  
(/tmp/mkconda/run_env) [turbach@mkgpu1 fitgrid]$ pytest tests/test_utils_lmer.py 
=========================================================== test session starts ============================================================
platform linux -- Python 3.6.9, pytest-5.3.2, py-1.8.0, pluggy-0.13.1
rootdir: /mnt/cube/home/turbach/TPU_Projects/fitgrid
collected 1 item                                                                                                                           

tests/test_utils_lmer.py F                                                                                                           [100%]

================================================================= FAILURES =================================================================
__________________________________________________________ test_get_lmer_dfbetas ___________________________________________________________

tpath = PosixPath('/mnt/cube/home/turbach/TPU_Projects/fitgrid/tests')

    def test_get_lmer_dfbetas(tpath):

        # the expected DFBETAS dataset was computed using the following code:
        """
        library(influence.ME)
        dat <- read.csv('epochs_to_test_dfbetas.csv')
        model <- lmer(channel0 ~ continuous + (continuous | categorical), data=dat)
        estex <- influence(model, 'categorical')
        write.csv(dfbetas(estex), 'dfbetas_test_values.csv')
        """
        TEST_EPOCHS = Path.joinpath(tpath, 'data', 'epochs_to_test_dfbetas.csv')
        TEST_DFBETAS = Path.joinpath(tpath, 'data', 'dfbetas_test_values.csv')

        expected = pd.read_csv(TEST_DFBETAS, index_col=0).T

        table = pd.read_csv(TEST_EPOCHS).set_index(['Epoch_idx', 'Time'])
        epochs = fitgrid.epochs_from_dataframe(
            table, channels=['channel0'], time='Time', epoch_id='Epoch_idx'
        )
        dfbetas = fitgrid.utils.lmer.get_lmer_dfbetas(
            epochs, 'categorical', RHS='continuous + (continuous | categorical)'
        )
        actual = dfbetas.loc[0, 'channel0'].unstack().astype(float)

>       assert np.allclose(actual, expected, atol=0)
E       assert False
E        +  where False = <function allclose at 0x7fa655b537b8>(                 cat0     cat1      cat2      cat3      cat4\n(Intercept)  0.261817 -0.27545  0.049361  0.010295  0.096300\ncontinuous  -0.009800  0.04177 -0.257149  0.201921 -0.196818,                  cat0     cat1      cat2      cat3      cat4\n(Intercept)  0.261817 -0.27545  0.049310  0.010295  0.096300\ncontinuous  -0.009799  0.04177 -0.257081  0.201921 -0.196818, atol=0)
E        +    where <function allclose at 0x7fa655b537b8> = np.allclose

tests/test_utils_lmer.py:72: AssertionError
----------------------------------------------------------- Captured stderr call -----------------------------------------------------------
100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 1/1 [00:03<00:00,  3.59s/it]
100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 1/1 [00:00<00:00,  1.29it/s]
100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 1/1 [00:00<00:00,  1.07it/s]
100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 1/1 [00:00<00:00,  1.71it/s]
100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 1/1 [00:00<00:00,  1.27it/s]
100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 1/1 [00:00<00:00,  1.65it/s]
============================================================= warnings summary =============================================================
tests/test_utils_lmer.py::test_get_lmer_dfbetas
  /tmp/mkconda/run_env/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
    return f(*args, **kwds)

tests/test_utils_lmer.py::test_get_lmer_dfbetas
  /tmp/mkconda/run_env/lib/python3.6/site-packages/rpy2/robjects/pandas2ri.py:191: FutureWarning: from_items is deprecated. Please use DataFrame.from_dict(dict(items), ...) instead. DataFrame.from_dict(OrderedDict(items)) may be used to preserve the key order.
    res = PandasDataFrame.from_items(items)

-- Docs: https://docs.pytest.org/en/latest/warnings.html
====================================================== 1 failed, 2 warnings in 8.36s =======================================================
(

Here is what Option 1 might look like for fitgrid 0.4.8 in a mkconda environment which includes pinned python 3.6, mkpy 0.1.6, numpy 1.16.4, fitgrid 0.4.8 and its supporting r-* friends, plus jupyter, r-tidyverse, and r-studio and a few other goodies.

relaxed np.isclose(actual, expected, atol=FIT_ATOL, rtol=FIT_RTOL) comparisons
out of tolerance values don't fail the test but are reported with a warning for inspection

(/tmp/mkconda/run_env) [turbach@mkgpu1 fitgrid]$ conda list "(r-lme|r-matrix|numpy|pandas|mkl|.*blas.*|^python\b|fitgrid|pymer|mkpy)"
# packages in environment at /tmp/mkconda/run_env:
#
# Name                    Version                   Build  Channel
blas                      1.0                         mkl  
fitgrid                   0.4.8              0_g728d17b_0    kutaslab
mkl                       2019.4                      243  
mkl-service               2.3.0            py36he904b0f_0  
mkl_fft                   1.0.15           py36ha843d7b_0  
mkl_random                1.1.0            py36hd6b4f25_0  
mkpy                      0.1.6                0_g0d5cce5    kutaslab
numpy                     1.16.4           py36h7e9f1db_0  
numpy-base                1.16.4           py36hde5b4d6_0  
pandas                    0.25.3           py36he6710b0_0  
pymer4                    0.6.0                    py36_0    kutaslab
python                    3.6.9                h265db76_0  
python-dateutil           2.8.1                      py_0  
r-lme4                    1.1_17           r351h29659fb_0  
r-lmertest                3.0_1            r351h6115d3f_0  
r-matrix                  1.2_14           r351h96ca727_0  
tests/test_utils_summary.py::test_summarize
========================================================== test session starts ===========================================================
platform linux -- Python 3.6.9, pytest-5.3.2, py-1.8.0, pluggy-0.13.1
rootdir: /mnt/cube/home/turbach/TPU_Projects/fitgrid
collected 1 item                                                                                                                         

tests/test_utils_summary.py .                                                                                                      [100%]

============================================================ warnings summary ============================================================
tests/test_utils_summary.py::test_summarize
  /tmp/mkconda/run_env/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
    return f(*args, **kwds)

tests/test_utils_summary.py::test_summarize
  /mnt/cube/home/turbach/TPU_Projects/fitgrid/tests/test_utils_summary.py:164: UserWarning: lmer has_warning values have changed
    warnings.warn(f'{modler} has_warning values have changed')

tests/test_utils_summary.py::test_summarize
  /mnt/cube/home/turbach/TPU_Projects/fitgrid/tests/test_utils_summary.py:196: UserWarning: 
  ------------------------------------------------------------
  fitted vals out of tolerance: 0.001 + 0.001 * expected
  lmer 1 + (continuous | categorical) (Intercept) Estimate
  [[ True  True]
   [False  True]]
                                                             channel0             channel1          
  val                                                        expected     fitted  expected    fitted
  Time model                          beta        key                                               
  0    1 + (continuous | categorical) (Intercept) Estimate  14.274358  14.274375 -2.765258 -2.765064
  1    1 + (continuous | categorical) (Intercept) Estimate  -0.248746  -0.252334 -8.092379 -8.092379
  ------------------------------------------------------------

    warnings.warn(msg)

tests/test_utils_summary.py::test_summarize
  /mnt/cube/home/turbach/TPU_Projects/fitgrid/tests/test_utils_summary.py:196: UserWarning: 
  ------------------------------------------------------------
  fitted vals out of tolerance: 0.001 + 0.001 * expected
  lmer 1 + continuous + (continuous | categorical) (Intercept) DF
  [[ True  True]
   [False  True]]
                                                                    channel0            channel1          
  val                                                               expected    fitted  expected    fitted
  Time model                                       beta        key                                        
  0    1 + continuous + (continuous | categorical) (Intercept) DF   0.959485  0.959447  0.962329  0.962947
  1    1 + continuous + (continuous | categorical) (Intercept) DF   3.125167  3.130235  3.999987  4.000000
  ------------------------------------------------------------

    warnings.warn(msg)

-- Docs: https://docs.pytest.org/en/latest/warnings.html
===================================================== 1 passed, 4 warnings in 56.03s =====================================================
(/tmp/mkconda/run_env) [turbach@mkgpu1 fitgrid]$

kutaslab / fitgrid