Open bjstewart1 opened 4 years ago
Hi, thanks for the suggestion! Are you referring to this function ?
It sounds a bit like ingest
but with multiple datasets, pinging @Koncopd to see what's his take on this
Hi, thanks for the suggestion! Are you referring to this function ? It sounds a bit like
ingest
but with multiple datasets, pinging @Koncopd to see what's his take on this
Yes that's the function.
I think it is doing something similar to ingest
I think this sort of batch-balanced PCA could be a useful addition addition where batches are very uneven in terms of number of cells.
ingest
uses pca only from a reference batch, so it is a bit different.
Does this multiBatchPCA
work well?
Like you say, the difference between this and ingest
is joint PCA calculation vs asymmetric batch integration.
This function is the first step in the fastMNN
function, which I have found in some cases yields very sensible batch correction results. It would be awesome to see multiBatchPCA
+/- fastMNN
available in scanpy. I am aware of the python implementation of mnncorrect
, but I think this still operates on expression values rather than a PCA representation (correct me if I am wrong..).
Without going all the way the batch correction, multiBatchPCA
is useful where different experiments have very different numbers of cells.
Hi all,
I am trying to use ScanPy for integrating multiple scRNA-Seq samples (~20). Doing so that I can look at RNA Velocity with SCVelo, and want to use MNN as I got good batch effect removal previously in monocle using MNN.
Is it true - as stated above, that the current implementation of mnncorrect with ScanPy is only operating on expression values? I have run through a ScanPy MNN tutorial provided by NBI Sweden. The results are improved, but it doesn't appear to work as well as in monocle - some separation by batch is still going on.
I'm wondering what the difference might be? Whether it could be due to the difference in PCA (multi-batch), or the actual MNN / batch effect removal step. Alternatively, I could use the corrected expression matrix, and add the UMAP coordinates/clusters from monocle, although I wonder if this is advisable.
If you have any info please let me know, or if I should raise a separate issue etc.
what's the stage of this @Koncopd @Mirkazemi ?
Soon (i hope).
Hi @r-reeves,
Maybe this is indeed a separate issue. mnnpy
is indeed working on the gene expression matrix, and not on a low dimensional embedding like FastMNN
(which is what I assume you might have been using?). You could try Scanorama which is a method similar to FastMNN, using a sped up algorithm and no iterative merging of batches, but a method they call "panoramic stitching". It has performed quite well in our benchmark of data integration methods, and is in the scanpy ecosystem and therefore should work seamlessly in a Scanpy workflow.
All of this being said, you will only get an integrated graph structure with this for scvelo, which may help a little, but won't remove the batch effect for RNA velocity calculation. scvelo doesn't currently have any batch removal in its pipeline as it is quite difficult to add as it works directly from the normalized count data and fits a model to these. @VolkerBergen has been thinking a bit about how to perform batch correction in an scvelo model, maybe he could chime in, or you could post an issue in the scvelo repo.
Hi @LuckyMD Thank you for the fast reply. Yes to FastMNN, as I understand from using align_cds – when you specify discretely what you want to remove e.g. sample-sample variation it calls FastMNN from batchelor. Thanks for the recommendation – I will check out Scanorama, been meaning to read the review on integration techniques.
you will only get an integrated graph structure with this for scvelo, which may help a little, but won't remove the batch effect for RNA velocity calculation. scvelo doesn't currently have any batch removal in its pipeline as it is quite difficult to add as it works directly from the normalized count data and fits a model to these.
Ahh okay, I misunderstood the process then – my understanding was that some of the mnn correction would be carried over when performing velocity analysis. I will check out the scvelo forum for info on comparing samples.
Thank you.
sc.tools
?It may be useful to adopt a PCA option similar to
multiBatchPCA
in the R batchelor package. This is a useful approach where there are imbalances in batch size and PCA is conducted across a merged experiment. It is pretty slow in R.From their documentation: