grimbough / rhdf5

Package providing an interface between HDF5 and R
http://bioconductor.org/packages/rhdf5
59 stars 22 forks source link

Address integer overflow in H5Screate_simple() #140

Closed hpages closed 4 months ago

hpages commented 5 months ago

Hi Mike,

When trying to create a 1D dataset with an extension > .Machine$integer.max, I get the following warning:

library(rhdf5)
h5createFile("test.h5")
h5createDataset("test.h5", "test", dims=0, maxdims=3e9)
# Warning message:
# In H5Screate_simple(dims, maxdims) :
#   NAs introduced by coercion to integer range

The warning is coming from H5Screate_simple():

sid <- H5Screate_simple(dims=0, maxdims=3e9)
# Warning message:
# In H5Screate_simple(dims = 0, maxdims = 3e+09) :
#   NAs introduced by coercion to integer range

sid
# HDF5 DATASPACE
#         rank 1
#         size 0
#      maxsize 18446744071562067968

This PR addresses this by coercing the supplied dims or maxdims vectors to numeric instead of integer, and by turning the numeric values to hsize_t values at the C-level.

With this change:

sid <- H5Screate_simple(dims=0, maxdims=3e9)
sid
# HDF5 DATASPACE 
#         rank 1
#         size 0
#      maxsize 3e+09

H.

codecov[bot] commented 5 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (4b84fb0) 77.51% compared to head (4a7893f) 77.51%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## devel #140 +/- ## ======================================= Coverage 77.51% 77.51% ======================================= Files 36 36 Lines 2006 2006 ======================================= Hits 1555 1555 Misses 451 451 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

grimbough commented 5 months ago

Thanks @hpages

I probably need to make a similar change to H5Sset_extent_simple and come up with a solution for the H5S_UNLIMITED issue. I'll merge it once I've done that.

hpages commented 5 months ago

For the H5S_UNLIMITED issue, you could adopt the convention that any negative dim stands for H5S_UNLIMITED. Then replace:

if (maxdims[i] == -1)
    maxdims[i] = H5S_UNLIMITED;

with:

if (REAL(_maxdims)[i] < 0)
    maxdims[i] = H5S_UNLIMITED;

I don't like doing REAL(_maxdims)[i] == -1 because who knows how reliable == between 2 floating point representations can be. I feel that it might cause some surprises on some platforms.

Another way to go would be to use NA_real_ to mean H5S_UNLIMITED (if it's not already used to mean something else). Then:

if (R_IsNA(REAL(_maxdims)[i]))
    maxdims[i] = H5S_UNLIMITED;