grimbough / rhdf5

Package providing an interface between HDF5 and R
http://bioconductor.org/packages/rhdf5
61 stars 22 forks source link

Support variable length, UTF-8 encoded scalar strings in attributes. #80

Closed LTLA closed 3 years ago

LTLA commented 3 years ago

Closes #79. To illustrate:

# in R:
library(rhdf5)

file <- "whee.h5"
unlink(file)
handle <- H5Fcreate(file)

rhdf5::h5createGroup(handle, "WHEEE")
ghandle <- H5Gopen(handle, "WHEEE")

h5writeAttribute("AARON_WAS_HERE", ghandle, "last-visitor", cset="UTF8", variableLength=TRUE, scalar=TRUE)

H5Gclose(ghandle)
H5Fclose(handle)

Which gives us a h5dump whee.h5 of:

HDF5 "whee.h5" {
GROUP "/" {
   GROUP "WHEEE" {
      ATTRIBUTE "last-visitor" {
         DATATYPE  H5T_STRING {
            STRSIZE H5T_VARIABLE;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_UTF8;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SCALAR
         DATA {
         (0): "AARON_WAS_HERE"
         }
      }
   }
}
}

and with h5py:

import h5py
X = h5py.File("whee.h5")
dict(X["WHEEE"].attrs)
## {'last-visitor': 'AARON_WAS_HERE'}

Tests and documentation for all the new functionality is left as an exercise for someone else.

LTLA commented 3 years ago

What say you @grimbough?

grimbough commented 3 years ago

Thanks @LTLA . I'll take a look at this over Easter.

LTLA commented 3 years ago

Ended up just adding the tests and docs myself. I don't mind it being merged as-is, but given that you're going to look at it anyway, you might as well extrapolate the extra features to string datasets rather than just attributes. To recap, these are:

LTLA commented 3 years ago

Nuge.

LTLA commented 3 years ago

Hello? Ground control to @grimbough ?

LTLA commented 3 years ago

image