r-lib / cpp11

cpp11 helps you to interact with R objects using C++ code.
https://cpp11.r-lib.org/
Other
193 stars 46 forks source link

cpp11 use of std::isnan or std::isinf returns random values when it shouldn't #307

Closed pachadotdev closed 1 year ago

pachadotdev commented 1 year ago

I have a function migrated from Rcpp to cpp11 that should take a vector, say x <- c(1,8,NA,2), and return c(F,F,T,F) with T for each value that is NA or NaN. This function is a simple for loop that uses std::. If I run the function two or more times, I get 2 or more different outputs that shouldn't be random. I described it here https://github.com/pachadotdev/fixest2/issues/27#issuecomment-1426900420.

Btw, I am working on fixest2, which is fixest adapted to use cpp11 instead of Rcpp

The steps to obtain the error from bash/shell are

git clone --depth 1 https://github.com/pachadotdev/fixest2.git
cd fixest2
git checkout cpp11_wip
R

the exact error with the output of a vector function can be seen here https://github.com/pachadotdev/fixest2/issues/39#issuecomment-1427116435

also in R...

first run:

> devtools::load_all()
ℹ Loading fixest2
> 
> est_slopes = feols(Ozone ~ Solar.R + Wind | Day + Month[Temp], airquality)
  [1]  TRUE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [18] FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE
 [35]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE
 [52]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
 [69]  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE
 [86] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE
[103]  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE
[120] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
[137] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE  TRUE

second run:

> devtools::load_all()
ℹ Loading fixest2
> 
> est_slopes = feols(Ozone ~ Solar.R + Wind | Day + Month[Temp], airquality)
  [1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [18] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE
 [35]  TRUE  TRUE  TRUE FALSE  TRUE FALSE FALSE  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE
 [52]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
 [69] FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE
 [86] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
[103]  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE
[120] FALSE FALSE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE FALSE
[137]  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE

the real T/F vector should be:

> is.na(airquality$Ozone)
  [1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [18] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE
 [35]  TRUE  TRUE  TRUE FALSE  TRUE FALSE FALSE  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE
 [52]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
 [69] FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE
 [86] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
[103]  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE
[120] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[137] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
pachadotdev commented 1 year ago

here's a MWE

> is_na_or_inf(c(1,NA,3,Inf,5))
[1] TRUE TRUE TRUE TRUE TRUE
> is_na_or_inf(c(1,NA,3,Inf,5))
[1] FALSE  TRUE FALSE  TRUE  TRUE
> is_na_or_inf(c(1,NA,3,Inf,5))
[1] TRUE TRUE TRUE TRUE TRUE

and the corresponding C++ code (called with the "source" button is

#include <cpp11.hpp>
#include <cpp11/doubles.hpp>
#include <cpp11/logicals.hpp>

using namespace cpp11;

[[cpp11::register]] logicals is_na_or_inf(SEXP x)
{
    int nobs = Rf_length(x);
    double *px = REAL(x);
    bool anyNAInf = true;

    writable::logicals is_na_inf(anyNAInf ? nobs : 1);

    for (int i = 0; i < nobs; ++i)
    {
        double x_tmp = px[i];
        if (std::isnan(x_tmp))
        {
            is_na_inf[i] = true;
        }
        else if (std::isinf(x_tmp))
        {
            is_na_inf[i] = true;
        }
    }

    return is_na_inf;
}
jimhester commented 1 year ago

Unless I am missing something, you never initialize the output vector to false or explicitly set the value to false if it is not a NA or an Inf. cpp11 doesn't do any initialization for you, so you are seeing random memory effects due to reading uninitialized memory.