scverse / anndataR

AnnData interoperability in R
https://anndatar.data-intuitive.com
Other
57 stars 8 forks source link

Fixes to `write_h5ad_categorical()` #155

Closed lazappi closed 3 months ago

lazappi commented 10 months ago

Fixes #154

rcannood commented 10 months ago

@lazappi Does it make sense to merge #138 into this PR?

rcannood commented 10 months ago

Or, I can merge #138 into main first -- since it doesn't seem to be failing at this moment in time. Then we can merge main back into this PR.

rcannood commented 10 months ago

This PR breaks the following test:

── Failure ('test-HDF5-write.R:91:3'): Writing H5AD categoricals works ─────────
hdf5_path_exists(h5ad_file, "/categorical/ordered") is not TRUE

This makes sense, since ordered is no longer a path but an attribute.

lazappi commented 10 months ago

I think the remaining error here is because of the {rhdf5} problem. I'm not really sure what to do about that.

rcannood commented 10 months ago

Going to close and reopen the PR because the diff in the GitHub UI is wrong.

rcannood commented 10 months ago

When I manually run write_h5ad_element(categorical, h5ad_file, "categorical") in test-HDF5-write.R, I get:

write_h5ad_element(categorical, h5ad_file, "categorical")
Warning message:
In value[[3L]](cond) :
  Could not write element 'categorical' of type 'factor':
datatype logical not yet implemented. Try 'double', 'integer', or 'character'.

Just for the record, this is related to grimbough/rhdf5#136

===========================

When I open the H5File, I do see that there is a categorical there.

> h5ad_file <- rhdf5::H5Fopen(h5ad_file)
> h5ad_file
HDF5 FILE 
        name /
    filename 

         name     otype dclass dim
0 categorical H5I_GROUP   

So maybe one question is: if something like write_h5ad_categorical fails, should we:

lazappi commented 10 months ago

So maybe one question is: if something like write_h5ad_categorical fails, should we:

  • throw an error
  • generate a warning but at least remove the categorical field
  • generate a warning but the data might be incomplete

I'm pretty sure any categorical data will show this error. We can't have an incomplete field because then Python anndata won't be able to read the file. The only solution I can think of is to try and fix this in {rhdf5} and/or add a function for writing logical attributes here. Temporarily maybe the best we can do is add a warning to write_h5ad_categorical() saying this is not yet implemented and do nothing. It's a fairly essential part of functionally though.

rcannood commented 10 months ago

Sorry, I was trying to have two coversations at once -- yes the categorical will fail if we can't write the ordered attribute.

However, in any situation where a write fails, it would probably be best to remove the field and show a warning. WDYT? (No need to already implement this, let's discuss first)

The only solution I can think of is to try and fix this in {rhdf5} and/or add a function for writing logical attributes here.

Yeah. I tried to have a look at rhdf5. Looking at the implementations in h5py and hdf5r , it seems we have to create an 8-bit enum with {0: FALSE, 1: TRUE}, but I'm not familiar enough with the rhdf5 codebase to make these changes. I tried, but I didn't get to something that worked :P

lazappi commented 10 months ago

However, in any situation where a write fails, it would probably be best to remove the field and show a warning. WDYT? (No need to already implement this, let's discuss first)

I think that's reasonable in general (it's what {zellkonverter} does). Maybe we want an argument to control whether people get an error or a warning (for cases when you need to be sure things are written)?

Yeah. I tried to have a look at rhdf5. Looking at the implementations in h5py and hdf5r , it seems we have to create an 8-bit enum with {0: FALSE, 1: TRUE}, but I'm not familiar enough with the rhdf5 codebase to make these changes. I tried, but I didn't get to something that worked :P

Yeah, I think it's beyond me as well. I guess we should try to push for that issue to be resolved?

lazappi commented 6 months ago

@rcannood I think this should work now with the current devel version of {rhdf5} if you want to take a look at it. I suspect the CI will fail though.

lazappi commented 3 months ago

Thanks for merging! We might want to bump the {rhdf5} version in DESCRIPTION to make it clearer what is required. I think I didn't do that already to avoid people needing to install Bioconductor devel.

rcannood commented 3 months ago

Good idea, thanks!