grimbough / rhdf5

Package providing an interface between HDF5 and R
http://bioconductor.org/packages/rhdf5
59 stars 22 forks source link

`H5Dread` fails for zero-length 32-bit unsigned integer datasets #134

Closed LTLA closed 8 months ago

LTLA commented 8 months ago
library(rhdf5)
tmp <- tempfile(fileext=".h5")

fhandle <- H5Fcreate(tmp, "H5F_ACC_TRUNC")
shandle <- H5Screate_simple(0)
dhandle <- H5Dcreate(fhandle, "blah", dtype_id="H5T_NATIVE_UINT32", h5space=shandle)
H5Sclose(shandle)
H5Dclose(dhandle)
H5Fclose(fhandle)

fhandle <- H5Fopen(tmp, "H5F_ACC_RDONLY")
dhandle <- H5Dopen(fhandle, "blah")
H5Dread(dhandle)
## Error in H5Dread(dhandle) : 
##  Not enough memory to read data! Try to read a subset of data by specifying the index or count parameter.

Oddly enough, it works if dtype_id is set to H5T_NATIVE_UINT16, or H5T_NATIVE_INT32; or the dataspace has non-zero extent. Truly a mystery.

Session information ``` R Under development (unstable) (2023-11-10 r85507) Platform: x86_64-pc-linux-gnu Running under: Ubuntu 20.04.6 LTS Matrix products: default BLAS: /home/luna/Software/R/trunk/lib/libRblas.so LAPACK: /home/luna/Software/R/trunk/lib/libRlapack.so; LAPACK version 3.11.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C time zone: America/Los_Angeles tzcode source: system (glibc) attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rhdf5_2.47.0 loaded via a namespace (and not attached): [1] compiler_4.4.0 rhdf5filters_1.15.1 Rhdf5lib_1.25.0 ```
grimbough commented 8 months ago

I think the discrepancy here is that we treat uint16 and int32 differently from uint32 since the first two can be mapped directly onto R integers, where as the last goes through the same processing as int64 etc.

We the former there's an allocVector(INTSXP, n)) to allocate the output vector and we just assume it's fine. With the latter it instead uses R_alloc and then there's a test for a non-zero return value. It looks like R_alloc with length 0 fails that test and it triggers the same warning as if the memory allocation had failed for some other reason.

I'll look into how that should really be handled. It doesn't seem like you should get the same return from R_alloc in these two scenarios, so probably the test is wrong.