mannau / h5

Interface to the HDF5 Library
Other
70 stars 22 forks source link

R CMD check fails w/dev Rcpp due to embedded NUL in string #73

Open kevinushey opened 6 years ago

kevinushey commented 6 years ago

An incoming PR to Rcpp (https://github.com/RcppCore/Rcpp/pull/917) will ensure that Rcpp throws an exception if one attempts to convert an Rcpp::String to a SEXP, if that string contains embedded NUL bytes. This behavior is in line with e.g. R's API rawToChar(), which will also err similarly if your raw vector contains NUL bytes:

> rawToChar(as.raw(c(64, 0, 64)))
Error in rawToChar(as.raw(c(64, 0, 64))) : embedded nul in string: '@\0@'

In our revdep checks, it looks like this causes an issue with h5 -- when compiled against the aforementioned version of Rcpp, I see:

> file <- h5file("test.h5", mode = "a")
> h5attr(file, "attr", size = 16) <- "hello"
> h5attr(file, "attr")
Error in ReadAttribute(.Object@pointer, .Object@dim) : 
  Embedded NUL in string. 

I believe this comes from the usages of Rcpp::String here:

https://github.com/mannau/h5/blob/09382bb2a276d2d81a7a869f6ffd5f95d362f7e2/src/Helpers.cpp#L406

https://github.com/mannau/h5/blob/09382bb2a276d2d81a7a869f6ffd5f95d362f7e2/src/Helpers.cpp#L419

I think there are a couple options to get around this:

  1. Manually truncate stsize based on the location of the first NUL byte in the string used during conversion;

  2. Use Rf_mkChar() to create the SEXP, allowing this API to automagically discover where the first embedded NUL byte (if any) exists.

We're hoping to merge the PR to master and have it become part of the next Rcpp release. Would you be open to a PR helping to accommodate this change?

Thanks! Sorry for the trouble.

kevinushey commented 6 years ago

IIUC, this is specifically caused here:

https://github.com/mannau/h5/blob/09382bb2a276d2d81a7a869f6ffd5f95d362f7e2/src/Helpers.cpp#L406

there's an off-by-one error, as the std::string object gets constructed with the trailing NUL byte used as part of the data.