grimbough / rhdf5

Package providing an interface between HDF5 and R
http://bioconductor.org/packages/rhdf5
59 stars 22 forks source link

Missing exports() breaks dependencies. #98

Closed dansmith01 closed 2 years ago

dansmith01 commented 2 years ago

CRAN just notified me they'll be pulling my rbiom package due to: Error: 'h5writeAttribute.character' is not an exported object from 'namespace:rhdf5'

I found it very helpful to manually call the appropriate class function for h5writeAttribute to ensure numeric variables are properly stored in h5 as integers or floats. However, it looks like PR #87 removed a lot of those exported functions.

Any chance those functions can be exported again and a new version pushed to BioConductor before CRAN removes my package on 2021-11-17? I'll attach a pull request. If this isn't feasible, please let me know asap so I can adapt my package's code to compensate instead.

dansmith01 commented 2 years ago

Added PR #99 and #100 for consideration.

bthieurmel commented 2 years ago

Hi,

Same for h5writeDataset function : Missing or unexported object: ‘rhdf5::h5writeDataset.array’ with CRAN "ultimatum" on 2021-11-17.

Thanks.

grimbough commented 2 years ago

Sorry this has broken your packages. It's annoying CRAN doesn't also use the developmental version of Bioconductor. This change was made months ago to try and find these problems before it was released.

I'm a bit conflicted here. The function pre-date my time as maintainer of rhdf5, but I've always assumed they were intended as S3 methods because of the .integer etc suffixes, and it feels wrong to export them as regular functions. It looks like in the past they were exported as both regular functions and S3 methods https://github.com/grimbough/rhdf5/blob/30d6cb74740be043a38f52714c4fd329ced23a76/NAMESPACE#L316-L366

I've also had other users point out that they rely on the S3 method export e.g. #89 and I suspect #99 & #100 will break it their usecase again. I don't know if I can force Roxygen to provide both types of export.

Ultimately I think the whole thing is a bit over engineered, because all the h5writeDataset and h5writeAttribute variants end up using the .array version of the function, and the choice of of HDF5 datatype is decided inside that function based on the type of the R object you're passing. I'm not sure you're actually gaining anything by using the explicit versions of the functions. I think this code demonstrates how the R type overrides the choice of method:

library(rhdf5)

some_numerics <- as.numeric(1:10)
is(some_numerics)
#> [1] "numeric" "vector"

## create a file and try to write our numeric as an integer
tf <- tempfile()
fid <- H5Fcreate(name = tf)
rhdf5:::h5writeDataset.integer( some_numerics, h5loc = fid, name = "still_a_numeric" )
rhdf5:::h5writeDataset.integer( as.integer(some_numerics), h5loc = fid, name = "now_an_integer" )
H5Fclose(fid)

## check the datatype of the new datasets
h5ls(tf)
#>   group            name       otype  dclass dim
#> 0     /  now_an_integer H5I_DATASET INTEGER  10
#> 1     / still_a_numeric H5I_DATASET   FLOAT  10

If you have an example where this isn't the behaviour I'm happy to keep looking at this, but I think it might be cleaner (and not actually change how anything works) for your code to use h5writeDataset() and h5writeAttribute() without the suffixes.

dansmith01 commented 2 years ago

@grimbough good point. I went ahead and closed the PRs and will adapt the code on my end. By the way, thank you for maintaining rhdf5!