kaneplusplus / bigmemory

126 stars 24 forks source link

CRAN Removal? #107

Closed patperry closed 3 years ago

patperry commented 3 years ago

I just got a not from the CRAN maintainers saying that bigmemory and all of its direct dependencies will be removed on October 20, due to failing checks https://cran.r-project.org/web/checks/check_results_bigmemory.html

Are you aware of this? Any plans to fix the checks?

agrueneberg commented 3 years ago

The failed check appears to be related to the following two mwhich tests, but only on newer Apple hardware (the ARM-based M1 SoC): https://github.com/kaneplusplus/bigmemory/blob/5fd3a0b157c6b9a62dec05ac1a76158b07bd5539/tests/testthat/test_mwhich.R#L34-L35

agrueneberg commented 3 years ago

It looks like there is MacOS builder that supports M1, akin to WinBuilder: https://mac.r-project.org/macbuilder/submit.html @kaneplusplus, how about uploading the package there to rule out a false positive? [edit: I managed to reproduce the bug there]

cdeterman commented 3 years ago

Would getting a clean pass on that MacOS builder be sufficient for CRAN to overlook the failed tests? I’m not familiar with this builder.

gmweaver commented 3 years ago

Seems like the tests are not seeing the update of [1, 1] to NA. I was able to reproduce the errors from CRAN check (shown below) by removing https://github.com/kaneplusplus/bigmemory/blob/master/tests/testthat/test_mwhich.R#L33 so that it is using the original x with no NA at [1, 1].

Failure (test_mwhich.R:35:5): mwhich recognizes NA
mwhich(x, 1:2, NA, "eq", "OR") not identical to 1.
Lengths differ: 0 is not 1

Failure (test_mwhich.R:36:5): mwhich recognizes NA
mwhich(x, 1:2, NA, "neq", "AND") not identical to as.numeric(2:10).
Lengths differ: 10 is not 9

Would moving to a new variable or at least from global to local make a difference? Something like:

test_that("mwhich recognizes NA", {
    x_na <- as.big.matrix(matrix(1:30, 10, 3))
    expect_warning(x_na[1,1] <- NA)
    expect_identical(mwhich(x, 1:2, NA, 'eq', 'OR'), 1)
    expect_identical(mwhich(x, 1:2, NA, 'neq', 'AND'), as.numeric(2:10))
})
patperry commented 3 years ago

Likely the UBSan issues are the culprit. There is some diagnostic information at https://www.stats.ox.ac.uk/pub/bdr/memtests/clang-UBSAN/bigmemory/tests/testthat.Rout and https://www.stats.ox.ac.uk/pub/bdr/memtests/clang-UBSAN/bigmemory/bigmemory-Ex.Rout.

This section seems relevant:

SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior bigmemory.cpp:397:14 in 
bigmemory.cpp:403:31: runtime error: nan is outside the range of representable values of type 'int'

It looks like the to_int_checked function in bigmemory.cpp is relying on undefined behavior.

agrueneberg commented 3 years ago

Would getting a clean pass on that MacOS builder be sufficient for CRAN to overlook the failed tests? I’m not familiar with this builder.

Probably not, but at least we have the means to test potential bug fixes in case no one here has an M1 Mac.

gmweaver commented 3 years ago

Perhaps related to how Rcpp handles NA between NumericVector and IntegerVector (https://teuder.github.io/rcpp4everyone_en/240_na_nan_inf.html).

to_int_checked may need to be changed to something like:

// [[Rcpp::export]]
SEXP to_int_checked(SEXP x) {
  if (TYPEOF(x) == INTSXP) return x;
  NumericVector nv(x);
  int i, n = nv.size();
  IntegerVector res(n);
  for (i = 0; i < n; i++) {
    if (NumericVector::is_na(nv[i])) {
      res[i] = NA_INTEGER;  
    }
    else {
      res[i] = nv[i];
      if (nv[i] != res[i]) {
        warning("Value changed when converting to integer type.");
        break;
      }
    }
  }
  for (; i < n; i++) res[i] = NumericVector::is_na(nv[i]) ? NA_INTEGER : nv[i];
  return res;
}

Verified that the change above results in the correct definition of NA for integers which is just the min possible value (-2147483648) and no longer raises a warning.

patperry commented 3 years ago

@gmweaver the behavior is also undefined if the magnitude of the value is too large. We also need to require the following:

nv[i] < (double)INT_MAX + 1 && nv[i] > -(double)INT_MAX - 1

(technically nv[i] == INT_MIN is defined behavior, but we'd want to exlucde it since INT_MIN is NA_INTEGER)

agrueneberg commented 3 years ago

I think @gmweaver and @patperry are on the right track. to_int_checked is used internally when assigning values to a bigmemory object of type integer:

> x <- as.big.matrix(matrix(1:30, 10, 3))
> x[1, 1] <- NA
Warning message:
In to_int_checked(value) : Value changed when converting to integer type.
agrueneberg commented 3 years ago

Here is the code that R uses internally for integer-to-real conversion in Rf_coerceVector:

int attribute_hidden
IntegerFromReal(double x, int *warn)
{
    if (ISNAN(x))
    return NA_INTEGER;
    else if (x >= INT_MAX+1. || x <= INT_MIN ) {
    *warn |= WARN_INT_NA;
    return NA_INTEGER;
    }
    return (int) x;
}
agrueneberg commented 3 years ago

FYI:

f <- Rcpp::cppFunction('void f() {
  Rprintf("%d\\n", (int) NA_REAL);
}')

On x86 (gcc) the result is -2147483648 (corresponding to NA_INTEGER), on ARM (gcc) the result is 0.

asardaes commented 2 years ago

May I ask what happened with the threat of archival? I don't see any new releases on CRAN, but the package remains.

kaneplusplus commented 2 years ago

I'm not sure. I submitted the latest version at the beginning of December, there were 2 notes that I responded to via email, and I have not received a response or acknowledgement. I hope it is eventually accepted, however, in the short-term, I would recommend the github version.