Closed mtmorgan closed 1 year ago
The enum-attributes branch now has some experimental support for this.
> h5readAttributes(h5ad_file, "/obs/CellType")
H5Tsize: 1
$`encoding-type`
[1] "categorical"
$`encoding-version`
[1] "0.2.0"
$ordered
[1] "FALSE"
At the moment, if it encounters an ENUM attribute it translates that into an R character vector; a bit like as.character()
applied to a factor
. In this case you could then apply as.logical()
to get the desired behaviour.
Longer term, it might be better to define an enum class or something, so you don't lose the actual values. Would be interested in your thoughts.
Thanks @grimbough ; I added a couple of comments to the commit.
I actually lean toward an enum with values 'TRUE' / 'FALSE' / (?NA) being coerced to logical; this is consistent with h5py's treatment of Boolean https://docs.h5py.org/en/stable/faq.html.
It seems like a close reading of https://support.hdfgroup.org/HDF5/doc/H5.user/DatatypesEnum.html would encourage a formal class to represent enum -- there are some subtleties that a casual user might get wrong. Not sure what the demand for this is at the moment...
Thanks @grimbough ; I added a couple of comments to the commit.
Thanks for that. I think I've addressed them now.
I actually lean toward an enum with values 'TRUE' / 'FALSE' / (?NA) being coerced to logical; this is consistent with h5py's treatment of Boolean https://docs.h5py.org/en/stable/faq.html.
I agree. It certainly feels like h5readAttributes()
, which is supposed to more "user friendly", should probably do something similar to h5py, so I've now implemented that. If the enum levels are a subset of c("FALSE", "TRUE", "NA")
then the attribute values will be converted to logical before being returned.
For now the lower-level H5Aread()
will return the character vector of enum labels as before, although ideally this would actually support the hypotheical H5_ENUM
class.
It seems like a close reading of https://support.hdfgroup.org/HDF5/doc/H5.user/DatatypesEnum.html would encourage a formal class to represent enum -- there are some subtleties that a casual user might get wrong. Not sure what the demand for this is at the moment...
I suspect demand isn't great, although I wasn't aware of this issue either until you raised it in the hackathon document. Perhaps there are other gaps in support that no one has mentioned to me, but would be nice to fill :shrug:
Hi @grimbough I'm not sure what happened to this, I can't find the branch anymore and I don't think it's been merged into 'devel'?
Sorry @mtmorgan I renamed the branch to https://github.com/grimbough/rhdf5/tree/scverse-hackathon since I thought there might be more things that came up during the hackathon, but I ended up only interacting with the SpatialData/Zarr group for the day I was there.
Thanks @grimbough ; I see
devtools::load_all()
## Error in add_classes_to_exports(ns = nsenv, package = package, exports = exports, :
## in package ‘rhdf5’ classes H5Enum were specified for export but not defined
Thanks, an unintended legacy of starting that hypothetical enum class and then not getting very far. It should load for you now.
I'm not sure if I mentioned it anywhere else, but I also added support to h5read()
for AnnData nullable arrays. There's an example at: https://github.com/grimbough/rhdf5/blob/e59a50ae1fb287755a1ad71a74524cbbdc022f58/tests/testthat/test_h5read.R#L160-L171
If you're actively experimenting with any these features I'll keep this scverse-hackathon branch for now, and merge into devel after you've done some testing and found the bugs.
Hi @grimbough ! Thanks for making these changes!
I'm currently looking into getting the anndataR package submitted to Bioconductor, and to that end I'm tracking down any outstanding issues.
I just installed the scverse-hackathon
branch I can confirm that the changes solve a lot of our issues!
Could these changes be merged into the devel version?
Hi @rcannood ,
Thanks for the nudge. I'd totally lost track of whether these had already been merged. The scverse-hackathon
branch has now been merged into devel
and everything should be available from v 2.45.2.
Let me know if you run into any other problems, probably by opening another issue.
Using the same too-complicated example from here:
results in
or once the file is in hand...
The problem is the
ordered
attribute, which is encoded as an ENUM, and the code path leading to it. https://github.com/grimbough/rhdf5/blob/bddaf01256ef2ba5c663c12589ff165565bad7f8/src/H5A.c#L328-L337One perhaps non-rhdf5 issue is that the python 'convention' is to treat an attribute with keys 'TRUE', 'FALSE' as a logical vector; hdf5r does this too, but perhaps it doesn't belong at this level.