Closed DriesSchaumont closed 1 year ago
Hey @DriesSchaumont,
Thanks for noticing this change of behaviour with pandas 2.0 and providing a great example to test it.
I've started addressing it in https://github.com/scverse/mudata/pull/43 with boolean + nan value combination that you highlighted. So far I'm taking advantage of nullable boolean arrays.
In case you have any thoughts on what behaviour you would find most intuitive and/or how we can potentially generalise this decision making beyond just bool -> boolean
conversion for nullable boolean arrays, I'd be interested to discuss it!
By the way, already with pandas 1.5.2
and mudata 0.2.3
, float
+ bool
is coerced to an object
(same as bool
+ float
).
And a short update is that mudata 0.3.0
will try to be more careful with using nullable boolean arrays to avoid potential issues like https://github.com/scverse/muon/issues/111 (e.g. by using bool
when there is no NA
in the column in the end).
Describe the bug With pandas 2.0.0, the concat behavior has changed when concatenating a boolean and numeric dtype. It the resulting dtype used to be a numeric dtype, which can be written by mudata. However, this has been changed to
object
, which results inTypeError: Can't implicitly convert non-string objects to strings
. The behavior ofbool
+nan
is also different from the behaviour ofstr
+nan
, the latter causing no problems.Warning in pandas 1.5.3:
To Reproduce
With pandas 2.0.0:
With pandas 1.5.3:
I think this can be tracked down to this concat: https://github.com/scverse/mudata/blob/da2de81261db76368da0a712cf819df3abb53fb7/mudata/_core/mudata.py#L543-L548
Expected behaviour I would not expect a change in behavior.
System
Additional context Could be related to https://github.com/scverse/anndata/issues/679 but the issue being reported here is a behavior change so I would flag this as a separate bug (either way the discrepancy between
str
+nan
andbool
+nan
should be resolved).