ilia-kats / MuData

MuData-compatible storage for bioconductor's MultiAssayExperiment
https://ilia-kats.github.io/MuData/
5 stars 4 forks source link

NaN in categorical column fails reading #6

Closed votti closed 1 year ago

votti commented 1 year ago

I have an mudata/anndata dataset exported with anndata=0.7.8.

When trying to read it, I get the error reading the var:

Error in factor(as.integer(values), labels = labels_items): invalid 'labels'; length 4732 should be 1 or 4733
Traceback:

1. readH5AD(file)
2. read_modality(h5, backed)
3. read_with_index(h5autoclose(view & "var"))
4. read_dataframe(dataset)
5. lapply(columnorder, function(name) {
 .     col <- group & name
 .     values <- read_attribute(col)
 .     if (H5Aexists(col, "categories")) {
 .         attr <- H5Aopen(col, "categories")
 .         labels <- H5Aread(attr)
 .         if (!is(labels, "H5Ref")) {
 .             warning("found categories attribute for column ", 
 .                 name, ", but it is not a reference")
 .         }
 .         else {
 .             labels <- H5Rdereference(labels, h5loc = col)
 .             labels_items <- H5Dread(labels)
 .             n_labels <- length(unique(values))
 .             if (length(labels_items) > n_labels) {
 .                 labels_items <- labels_items[seq_len(n_labels)]
 .             }
 .             values <- factor(as.integer(values), labels = labels_items)
 .             H5Dclose(labels)
 .         }
 .         H5Aclose(attr)
 .     }
 .     H5Dclose(col)
 .     values
 . })
6. FUN(X[[i]], ...)
7. factor(as.integer(values), labels = labels_items)
8. stop(gettextf("invalid 'labels'; length %d should be 1 or %d", 
 .     nlab, length(levels)), domain = NA)

I found the reason was that the column contains NA that are represented as -1 in the categorical values but do not have a matching label in the categories.

Would you be interested in a PR with a fix?