Open oligomyeggo opened 3 years ago
I figured it out. I forgot that I should (or at least, I think I should?) set adata_sub.raw = adata.sub
. Adding this step before running a new embedding and clustering seemed to have fixed my issue.
I guess a follow-up question would be is this an acceptable approach? I stored my full data set in raw
after log-normalizing my data (so, adata.raw = adata
) during initial preprocessing. Since adata_sub
is just a subset of adata
, I am guessing it is ok to set adata_sub.raw = adata.sub
?
Apologies for opening this issue based on what was an oversight on my part.
Hello, I am working with an
adata
object (adata.shape
produces(8648, 18074)
) that I have subset to only include 990 genes of interest (and only include cells that express my genes of interest), with the hopes of clustering cells based on expression of my genes of interest (I got this idea from issue #510). After I subset myadata
object, I confirmed that the shape ofadata_sub
is as expected (adata_sub.shape
produces(6603, 990)
). However, after running a new embedding and clustering onadata_sub
, I have noticed that I can plot genes that shouldn't be inadata_sub
(but were inadata
), and that when I runsc.tl.rank_genes_groups
my results aren't restricted to my 990 genes of interest. I am guessing that I subsetted my data incorrectly (though, why would I have the correct shape?).Minimal code sample (that we can copy&paste without having any data)
When I use
sc.pl.umap(adata_sub)
to plot expression of a gene that is not one of my genes of interest, it is still plotted (I would expect an error telling me that the gene is not found in myadata_sub
object). Similarly, the results ofsc.tl.rank_genes_groups(adata_sub, groupby='leiden_sub', key_added='rank_genes_sub', method='wilcoxon')
returns top ranked genes that are not (or should not be) in myadata_sub
object.Thank you for any help/clarification as to what's going on!
Versions