theislab / zellkonverter

Conversion between scRNA-seq objects
https://theislab.github.io/zellkonverter/
Other
144 stars 27 forks source link

Problem with missing values in AnnData 0.8.0 #87

Closed jackkamm closed 1 year ago

jackkamm commented 1 year ago

Followup to https://github.com/theislab/zellkonverter/pull/86

For krumsiek11_augmented_v0-8.h5ad in that PR, when reading it in with the default Python reader:

sce <- readH5AD(system.file("extdata", "krumsiek11_augmented_v0-8.h5ad", package = "zellkonverter"))

The following data, which contain missing values, are not properly read in:

For example, here is how one of those columns looks:

> metadata(sce)$dummy_int2

<IntegerArray>
[1, 2, <NA>]
Length: 3, dtype: Int64

> class(metadata(sce)$dummy_int2)

[1] "pandas.core.arrays.integer.IntegerArray"
[2] "pandas.core.arrays.numeric.NumericArray"
[3] "pandas.core.arrays.masked.BaseMaskedArray"
[4] "pandas.core.arraylike.OpsMixin"
[5] "pandas.core.arrays.base.ExtensionArray"
[6] "python.builtin.object"

So it appears to be a pointer to some python object, rather than an R integer vector as expected.

dummy_bool2 is much the same as dummy_int2 (except it is printed as a <BooleanArray> instead of <IntegerArray>).

dummy_category is a bit different from dummy_bool2 and dummy_int2 -- the reader simply skips over it with this warning:

Warning messages:
1: Conversion failed for the item dummy_category in uns with the following error and has been skipped
Conversion error message: "AttributeError: 'Categorical' object has no attribute 'get_values' "

and metadata(sce)$dummy_category is NULL.

Note that colData(sce)$dummy_num2 is correctly handled however -- there does not seem to be a problem with numeric vectors with missing values, only the factors/ints/logicals.

lazappi commented 1 year ago

Thanks! This is probably because {reticulate} doesn't do any special conversion of those types and we may need to handle them specially (although I'm not entirely sure why...).