Closed lazappi closed 3 months ago
@lazappi Does it make sense to merge #138 into this PR?
Or, I can merge #138 into main first -- since it doesn't seem to be failing at this moment in time. Then we can merge main back into this PR.
This PR breaks the following test:
── Failure ('test-HDF5-write.R:91:3'): Writing H5AD categoricals works ─────────
hdf5_path_exists(h5ad_file, "/categorical/ordered") is not TRUE
This makes sense, since ordered
is no longer a path but an attribute.
I think the remaining error here is because of the {rhdf5} problem. I'm not really sure what to do about that.
Going to close and reopen the PR because the diff in the GitHub UI is wrong.
When I manually run write_h5ad_element(categorical, h5ad_file, "categorical")
in test-HDF5-write.R
, I get:
write_h5ad_element(categorical, h5ad_file, "categorical")
Warning message:
In value[[3L]](cond) :
Could not write element 'categorical' of type 'factor':
datatype logical not yet implemented. Try 'double', 'integer', or 'character'.
Just for the record, this is related to grimbough/rhdf5#136
===========================
When I open the H5File, I do see that there is a categorical there.
> h5ad_file <- rhdf5::H5Fopen(h5ad_file)
> h5ad_file
HDF5 FILE
name /
filename
name otype dclass dim
0 categorical H5I_GROUP
So maybe one question is: if something like write_h5ad_categorical
fails, should we:
categorical
fieldSo maybe one question is: if something like
write_h5ad_categorical
fails, should we:
- throw an error
- generate a warning but at least remove the
categorical
field- generate a warning but the data might be incomplete
I'm pretty sure any categorical data will show this error. We can't have an incomplete field because then Python anndata won't be able to read the file. The only solution I can think of is to try and fix this in {rhdf5} and/or add a function for writing logical attributes here. Temporarily maybe the best we can do is add a warning to write_h5ad_categorical()
saying this is not yet implemented and do nothing. It's a fairly essential part of functionally though.
Sorry, I was trying to have two coversations at once -- yes the categorical will fail if we can't write the ordered
attribute.
However, in any situation where a write fails, it would probably be best to remove the field and show a warning. WDYT? (No need to already implement this, let's discuss first)
The only solution I can think of is to try and fix this in {rhdf5} and/or add a function for writing logical attributes here.
Yeah. I tried to have a look at rhdf5
. Looking at the implementations in h5py
and hdf5r
, it seems we have to create an 8-bit enum with {0: FALSE, 1: TRUE}
, but I'm not familiar enough with the rhdf5
codebase to make these changes. I tried, but I didn't get to something that worked :P
However, in any situation where a write fails, it would probably be best to remove the field and show a warning. WDYT? (No need to already implement this, let's discuss first)
I think that's reasonable in general (it's what {zellkonverter} does). Maybe we want an argument to control whether people get an error or a warning (for cases when you need to be sure things are written)?
Yeah. I tried to have a look at
rhdf5
. Looking at the implementations inh5py
andhdf5r
, it seems we have to create an 8-bit enum with{0: FALSE, 1: TRUE}
, but I'm not familiar enough with therhdf5
codebase to make these changes. I tried, but I didn't get to something that worked :P
Yeah, I think it's beyond me as well. I guess we should try to push for that issue to be resolved?
@rcannood I think this should work now with the current devel version of {rhdf5} if you want to take a look at it. I suspect the CI will fail though.
Thanks for merging! We might want to bump the {rhdf5} version in DESCRIPTION
to make it clearer what is required. I think I didn't do that already to avoid people needing to install Bioconductor devel.
Good idea, thanks!
ordered
as an attribute rather than a datasetcategories
as a stringcodes
as 0-indexed integer to withNA
s set to-1
Fixes #154