Closed wangjiawen2013 closed 6 years ago
You can do the first already now by passing color=genename
to pl.diffmap
.
I don't have much experience with mnn_correct
but, if cell cycle is a problem, you can definitely still regress this out; for instance, on a per-batch level.
Ah, sorry, maybe this wasn't clear. You need to set the .raw
attribute of AnnData
for doing that at some point.
adata.raw = adata # at the point during preprocessing at which you wish store a copy for visualization and differential testing
You can then set use_raw=False
in several functions, if you want to acess .X
instead.
It is said that "Be reminded that it is not advised to use the corrected data matrices for differential expression testing." in scanpy document (http://scanpy.readthedocs.io/en/latest/api/scanpy.api.pp.mnn_correct.html) when execute MNN correction. However, Haghverdi Laleh (the one who presents MNN correction strategy, https://www.nature.com/articles/nbt.4091) says "MNN correction improves differential expression analyses, After batch correction is performed, the corrected expression values can be used in routine downstream analyses such as clustering prior to differential gene expression identification" in his Nature Biotech paper. So, I am a little confused. We have compared some corrections methods, such as regress_out, combat, MNN and MultiCCA (used by seurat), the results show that MNN and CCA have a better effect than regress_out and combat.
MNN and CCA is of great use when analyze mutli single cell libraries which are merged together, because each library maybe disturbed by batch effect.
Hi. Maybe I can help a little as well.
Typically batch correction or data integration methods would be used to obtain good clustering of the data, however once differential testing is performed it is still unclear whether the corrected data can or should be used (no batch correction method is perfect and may overcorrect).
The standard strategy would be to correct for batch, and any other covariates that you are not interested in for the clustering process. Once you have the clusters, it is standard practice to go back to the raw data and use a differential testing algorithm that allows you to account for batch and other technical covariates in the model (e.g. MAST).
@falexwolf In your Bioinformatics paper "destiny: diffusion maps for large-scale single-cell data in R", you show how to determine the optimal Gaussian kernel width and the plot of The Eigenvalues of the first 100 diffusion components. Could you tell us how to perform it with scanpy?
@wangjiawen2013 that would be my paper, and I don’t think scanpy stores the eigenvalues after computing the diffusion map.
It should be stored in adata.uns['diffmap_evals'] according to https://github.com/theislab/scanpy/blob/master/scanpy/tools/dpt.py#L17
Yes, the eigenvalues are stored.
There is no need to choose a kernel width within in Scanpy. Anything is done automatically. The only parameters are the number of neighbors and the kernel type (method
in pp.neighbors
).
Can you extend scanpy functions so that I can show gene expression level on plot generated by sc.pl.diffmap? just like that monocle2 does.
And, in which step should I execute MNN batch effect correction ? Is it still necessary to regress out some variables ( n_counts, percent_mito, cell cycle et al.,) when I execute MNN ?