grimbough / rhdf5

Package providing an interface between HDF5 and R
http://bioconductor.org/packages/rhdf5
60 stars 21 forks source link

Informative error when chunk size exceeds data size #97

Closed ekernf01 closed 2 years ago

ekernf01 commented 2 years ago

When the chunk size exceeds the data size on any dimension, it yields the same non-specific error mentioned in https://github.com/grimbough/rhdf5/issues/32 . It would be helpful to have a separate message in this situation.

rhdf5::h5createDataset(rhdf5::H5Fcreate(tempfile()), dataset = "foo", dims = c(10,10), chunk = c(10, 11))
grimbough commented 2 years ago

Thanks for the report. I've amended the behaviour so that it will automatical adjust the offending chunk dimension to match the maximum for the dataset and to print a warning that it's done this. This is available in version 2.37.4

library(rhdf5)
f1 <- tempfile()
h5createFile(f1)
rhdf5::h5createDataset(f1, dataset = "foo", dims = c(10,10), chunk = c(10, 11))
#> Warning: One or more chunk dimensions exceeded the maximum for the dataset.
#> These have been automatically set to the maximum.
#> The new chunk dimensions are: c(10,10)

One minor thing. Maybe you only did this for a nice short example, but I wouldn't recommend calling H5Fcreate() inside h5createDataset() like this. The H5 functions return a handle to an HDF5 object, and if you don't assigned that to an R variable on creation you can't close it. I think with this example code you'll end up with a permanently open file, and other functions will complain if you try to do anything else with it. You'll have to run h5closeAll(), which works but is sometimes a bit heavy handed.

ekernf01 commented 2 years ago

Thank you, and thanks for the tip! I will watch out for that in the future.