vegandevs / vegan

R package for community ecologists: popular ordination methods, ecological null models & diversity analysis
https://vegandevs.github.io/vegan/
GNU General Public License v2.0
453 stars 98 forks source link

CRAN: Additional issues #282

Closed jarioksa closed 6 years ago

jarioksa commented 6 years ago

In addition to normal CRAN tests, there are now some special tests labelled as "Additional issues". Vegan 2.5-1 has reports on four of these Additional issues. See:

https://cran.rstudio.com/web/checks/check_results_vegan.html

I haven't paid much attention to these Additional tests, but it seems that they can be a stumbling block in CRAN submission.

Three of the tests are differences in results using alternative BLAS/Lapack implementations (ATLAS, MKL, OpenBLAS). Basically these are small numerical differences (like giving a zero numerically as 10-33 or 10-32) and sign changes in eigenvectors. However, the problem is that the tests now compare exact printed character presentation and fails on sign change and numerical differences within tolerance.

One test is using R without long doubles which seems to give some further BLAS/Lapack differences, but also changes some nullmodel simulations. Inspecting this issue needs recompiling R-devel with non-standard options and then rebuilding and installing all packages needed in testing vegan (including the bloody #279). The nullmodel summary statistics are not very different, but it would be good to understand what is going on.

It may be that these Additional issues are not fatal and we may get through. However, this means explaining things to CRAN, and it is never certain that our points are accepted this time. Communication with CRAN is always a risk and should be avoided.

The easiest solution is to remove tests like we did in 2.4 series for the very same reason. The tests are still in the master branch of vegan for our own use, but we would not bother CRAN with the tests. Alternatively the tests must be written more robustly, but this is difficult with the current R engine that just compares test output to a reference output.

psolymos commented 6 years ago

I had Additional issues in the past for my opticut package related to dummy examples leading to perfect fit in lm (thus infinite log likelihoods). As a result I got notified by BRD to fix it:

You are unlikely to be able to reproduce these, but they do indicate issues with your package or its tests. See the discussion on testing in §1.6 of 'Writing R Extensions' ... and

  • in some cases tolerances are too small.
  • where singular matrices are reported, take a look at the conditioning of that matrix computed on a platform you have access to.
  • if theory says a matrix is symmetric, symmetrize it ( 0.5*(A+t(A)) )
  • do not use identical() nor == on numerical results without exceptional cause.
  • singular vectors and eigenvectors have arbitrary signs.

Please correct as much as possible and submit a correction before Feb 12 to safely retain the package on CRAN.

It seems like a wise precaution to remove the responsible tests.

gavinsimpson commented 6 years ago

I don't know the amount of work involved, and therefore it may be expedient to remove them to get 2.5-2 out the door, but I am not generally in favour of removing tests from the released version (if we can help it).

(I currently have my R compiled against OpenBLAS, so I may run into issues with these tests just when we are developing and checking the package.)

Ideally, we'd do as we have and check for differences in results with a lower tolerance — when checking printed output from examples, R Core/CRAN have long suggested using fewer digits for exactly these minor platform differences. I guess we should be doing the same here for these tests.

gavinsimpson commented 6 years ago

@jarioksa

...but this is difficult with the current R engine that just compares test output to a reference output.

If that's all we do with these tests, then adding op <- options(digits = 6) (say ) at the start of the script and a reset (options(op)) at the end should suffice; adjust 6 as needed to get the tolerance you want.

This is what we need to do for linear algebra-related examples and their reference material for example.

jarioksa commented 6 years ago

@gavinsimpson : it seems that we do not remove all tests, but they can improved. However, almost all differences came from cca-object-tests.R which really are not tests: I added these when I was developing the new ordConstrained infrastructure shared by all [partial] [constrained] ordination methods, and I just wanted to see that the result objects remained about the same and the basic "cca" methods still worked. So this was to compare invariance of consecutive commits in the same system. The "tests" are really bad as tests: they show things like the head of ordination scores, eigenvalues etc which are guaranteed to differ between BLAS/Lapack implementations (signs of eigenvectors, least significant bits of results which become visible when the values are actually zero but differ numerically. Removing these tests (and perhaps transferring some relevant ones to vegan-tests.R solves most of the problems).

The vegan-tests.R can be made more robust. There are two cases where we expected zeros within a tolerance, but just dumped the about-zero values and so got a message that in one case zero was estimated as 10-32 and in another 10-33. Changing these tests to really test that we are within sqrt(.Machine$double.eps) from zero will solve the problem. Finally, if we turn off tracing iterations in metaMDS we get rid of last-bit variation in Procrustes sum of squares reports.

The only remaining problem is the long-double case which needs scrutiny of src/nestedness.c and re-compilation of R. As to knitr issue (#279), I'll comment it there.

I don't know what they do with this now: apart from the long-double issue all are about BLAS/Lapack which is known to be inconsistent, and the additional tests put the responsibility of handling BLAS/Lapack quirks to package developers.

jarioksa commented 6 years ago

I have been working with tests & "additional issues". Currently the vegan-tests.R are more soundly written so that we do not check for exact zero but use tolerance, and these test passed checks with two version of BLAS/Lapack (built-in R BLAS and OpenBLAS).

The cca-object-tests.R were removed. Most of the output volume was intended to be used to monitor the stability of the results when re-factoring constrained ordination and cannot be kept meaningfully. However, some tests could be resurrected. At least the tests that support methods and extraction functions work on ordination objects, and that special formulae work, handling missing values and subsetting work, and that permutation tests work for different cases. We could also check that some cases that should be identical indeed are identical: rda, dbrda and capscale with Euclidean distances; dbrda and capscale on metric distance (like squareroot of Jaccard); unconstrained cca, orthogonal decorana and wmcdscale with row weights and Euclidean distances of Chi-square transformed data.

The "no long double" (noLD) test need more work, and R must be compiled without long double support. The reasons for test differences is also more difficult to track: we do not use long double in vegan and the source of differences is either in non-vegan code or hidden in macros. I guess BLAS/Lapack changes will explain some differences, and these should be gone with fixed tests. However, there are differences in ecological null models in oecosimu-tests which should be identified. It is tedious to identify the differences based on the diff output in CRAN, but it seems that the differences come from permatfull tests

and permatswap tests

but I may have overlooked some cases. However, as soon as I have a noLD R, I'll know better.

psolymos commented 6 years ago

@jarioksa the permat* tests related investigation might also prompt us/me to follow up on #159 . We have some functionality in these functions (stratified null models, some graphics and summaries for chacking/comparing results) that would be nice to have for the new infrastructure. But once we have that, permat* functions will be deprecated.

jarioksa commented 6 years ago

We do not use long double anywhere in our C functions, and therefore it needs a bit more work to find out why we get inconsistent test results when vegan is tested --without-long-double. My quick inspection hinted that we get these inconsistent results in permat* functions at least in the cases where the make.commsim() model used stats::rmultinom(), and it seems that the file src/nmath/rmultinom.c in R distribution indeed defines LDOUBLE p_tot = 0.; (line 54 in that file in R-devel). If so, this is not in our control, but it is an R implementation issue, and all we need to do is to explain the issue. However, this still needs verification.

jarioksa commented 6 years ago

I can confirm unstable oecosimu-tests.R results when R is built with ./configure --disable-long-double (NB, this is the correct option switch: CRAN has wrong). All different tests results come from permatfull and permatswap tests and concern the following null models: r00_ind , r0_both , c0_ind, c0_both, swsh_both_r and swsh_both_c. All these tests allocate individuals with stats::rmultinom() which is based on C code having LDOUBLE.

I think we have no reason to do anything with our code, since the instability seems to be in the compiled C code for stats::rmultinom() deep in R (@psolymos ?)

There were also differences in cca-object-tets.R in the CRAN report https://www.stats.ox.ac.uk/pub/bdr/noLD/vegan.out, but these were all fixed with updated BLAS/Lapack tests.

psolymos commented 6 years ago

@jarioksa : the oecosimu-test.R was set up to make sure the nullmodel related overhaul did not break things. It is not fully representative of all the nullmodel algorithms we now have in vegan, thus cannot really fulfill its original purpose.

Most if the tests focus on fill and marginal stats. We have example on the help page ?commsim which is much better at looking at all the nullmodel algos defined in make.commsim:

diagfun <- function(x, y) {
    c(sum = sum(y) == sum(x),
        fill = sum(y > 0) == sum(x > 0),
        rowSums = all(rowSums(y) == rowSums(x)),
        colSums = all(colSums(y) == colSums(x)),
        rowFreq = all(rowSums(y > 0) == rowSums(x > 0)),
        colFreq = all(colSums(y > 0) == colSums(x > 0)))
}
evalfun <- function(meth, x, n) {
    m <- nullmodel(x, meth)
    y <- simulate(m, nsim=n)
    out <- rowMeans(sapply(1:dim(y)[3],
        function(i) diagfun(attr(y, "data"), y[,,i])))
    z <- as.numeric(c(attr(y, "binary"), attr(y, "isSeq"),
        attr(y, "mode") == "double"))
    names(z) <- c("binary", "isSeq", "double")
    c(z, out)
}
x <- matrix(rbinom(10*12, 1, 0.5)*rpois(10*12, 3), 12, 10)
algos <- make.commsim()
a <- t(sapply(algos, evalfun, x=x, n=10))
print(as.table(ifelse(a==1,1,0)), zero.print = ".")

This example IMO is sufficient to replace most of the tests in oecosimu-test.R using e.g. example("commsim", "vegan").

jarioksa commented 6 years ago

@psolymos : I did this in commit 00bfa8d17748b2b1567a1685c9a0a88398094df8 which indeed passes tests also without long double in R.

The new test is essentially more superficial and relaxed. The purpose of the old test was to see that the simulated matrices are unchanged and check that we can guarantee exact replication of simulation results across vegan versions. It seems that we cannot test this because we cannot guarantee exact replication across R architectures and configurations. Moreover, we already ditched that idea in the 2.5-1 release where binary swap models changed from previous vegan releases, because now we do not generate so many useless random numbers in C for faster simulation.

We may consider adding some tests that also check that simulations are constant (although these would fail with no-long-double checks in CRAN). These could use oecosimu with a suitable statistic.

Because we no longer guarantee replication of simulation results across releases, we also should remove "r0_old" null model that was added to provide compatibility to previous releases.

psolymos commented 6 years ago

@jarioksa : nice, thanks! I agree with removing "r0_old" which is non sequential (no thinning to consider) and the distribution should be unaffected by the version. That is another reason why testing exactness of single matrices is not a necessity, unless we are testing changes to the interface and want to ensure that low level functions are producing the same arrays.

jarioksa commented 6 years ago

All "Additional issues" tests pass now. We may consider more extensive oecosimu-tests later.