Open LuckyMD opened 6 years ago
Yes, this is related to the fact that sanitize_anndata
cannot be meaningfully applied to a view of AnnData
. You're right that one should also account for this case... I'll give it a thought. At least there should be a proper error hinting people to call sc.utils.sanitize_anndata
when trying the call you mention.
Thank you very much for pointing this out. :smile: It should have happened also before version 1.1, though.
I have something that might be related:
ad = ad[ad.obs['cell type'] != 'nan'].copy()
assert np.all(ad.obs['cell type'] != 'nan')
sc.utils.sanitize_anndata(ad)
assert np.all(ad.obs['cell type'] != 'nan')
This fails in the second assert:
AssertionError Traceback (most recent call last)
<ipython-input-103-2f44e51fdcae> in <module>
8 assert np.all(ad.obs['cell type'] != 'nan')
9 sc.utils.sanitize_anndata(ad)
---> 10 assert np.all(ad.obs['cell type'] != 'nan')
11
12
AssertionError:
It's really black magic, any ideas?
PS: nan
s are really string, not proper NaNs.
@gokceneraslan are there actually nan
s in there? Could be related to https://github.com/theislab/anndata/issues/141.
Yes there are, and this is how I realized it. I saw them in the plots and wondered why they show up after removing them.
Oh you mean real NaNs, no there is not.
I'm having this issue where I read in and merge multiple anndata's with concat. I can't run any of the plotting functions because I get this error. I tried to convert all object/string obs to categorical (except obs names) but I can't really get around it at all.
I get quite a strange scanpy error, which appears a bit stochastic... This is has happened for the first time in version 1.1.
I am trying to get a scatter plot of a subsetted anndata object like this:
p4 = sc.pl.scatter(adata[adata.obs['n_counts']<10000 ,:], 'n_counts', 'n_genes', color='mt_frac')
When I do this the first time round, I get this error message about categorical variables from sanitize_anndata (none of which are actually used in the call).
Then, I comment out the respective line of code, run the whole thing again, and it works. And when I uncomment the line it works fine again.
When I comment the line for the first time, I get a couple of lines displayed in the output saying:
or something like that...
My theory is that sanitize_anndata() detects that these variables should be categorical variables and tries to convert them into categoricals. As this sc.pl.scatter call is the first time sanitize_anndata() is called after the variables are read in, this is the first time this conversion would take place. However, I am calling the sc.pl.scatter() on a subsetted anndata object, so it somehow cannot do the conversion. Once I call sc.pl.scatter on a non-subsetted anndata object once, the conversion can take place and I can subsequently call sc.pl.scatter also on a subsetted anndata object.
If this is true, I can see why this is happening. However I feel this behaviour will be quite puzzling to a typical user. Maybe sanitize_anndata() should be called before plotting (probably hard to implement), or the plotting functions should have a parameter to plot only a subset of the data. That way sanitize_anndata can be called on the whole anndata object every time as there is no longer a reason to pass a view of the object. You could then test if a view is being passed to sanitize anndata, and then say "please don't pass subsetted anndata objects to plotting functions" or something like that.