grimbough / rhdf5

Package providing an interface between HDF5 and R
http://bioconductor.org/packages/rhdf5
61 stars 22 forks source link

other compression libs #34

Closed niekverw closed 4 years ago

niekverw commented 5 years ago

Not really an issue, but I was wondering if it is possible to use other compression libs such as snappy bzip2 etc?

http://danielhnyk.cz/comparison-of-compression-libs-on-hdf-in-pandas/

grimbough commented 4 years ago

I've been doing some work on this and made some available in the rhdf5filters package

grimbough commented 4 years ago

As of rhdf5 version 2.33.1 you can provide the filter argument to h5createDataset() to specify that one of the optional plugins found in rhdf5filters should be used when writing the data chunks e.g.

library(rhdf5)

h5createFile("ex_createDataset.h5")
#> [1] TRUE

h5createDataset("ex_createDataset.h5", dataset = "A", 
                dims = c(5,8), chunk = c(5,1), 
                filter = "BZIP2", level = 6)
#> [1] TRUE

h5write(matrix(1:40, nrow = 5, ncol = 8), 
        file = "ex_createDataset.h5", name = "A")

Reading datasets compressed with any of the supported filters should be transparent if rhdf5filters is installed

h5dump("ex_createDataset.h5")
#> $A
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,]    1    6   11   16   21   26   31   36
#> [2,]    2    7   12   17   22   27   32   37
#> [3,]    3    8   13   18   23   28   33   38
#> [4,]    4    9   14   19   24   29   34   39
#> [5,]    5   10   15   20   25   30   35   40