Python and R SOMA APIs using TileDB’s cloud-native format. Ideal for single-cell data at any scale.
90
stars
25
forks
source link
[python] TileDB-SOMA-Py can do better null-filtering with implicit Pandas DataFrame categories #2860
Open
johnkerl opened 2 months ago
Split out from #2858.
Here is a repro script: https://gist.github.com/johnkerl/68c978ac9d3e774749f77e704fd718d3
Here is a readback script: https://gist.github.com/johnkerl/20e0ad08701f5913f90be706ecd99b01
Notes;
["", "B cell", "T cell", None, pd.NA, math.nan]
pd.Categorical
levels are supplied as["B cell", "T cell"]
then all is wellpd.Categorical levels are not supplied, we get six levels:
"B cell",
"T cell",
"",
"None",
", and
"nan"`None
,pd.NA
, andmath.nan
from any provided levels. It makes no sense to do anything else -- ?None
(missing) andNA
(not missing, and known to be not applicable)