kaizhang / SnapATAC2

Single-cell epigenomics analysis tools
https://kzhang.org/SnapATAC2/
197 stars 20 forks source link

Maintaining Batch Correction in Joint Embedding #311

Open meiiemg opened 3 weeks ago

meiiemg commented 3 weeks ago

Hi,

Thank you very much for developing the excellent SnapATAC2 tool!

I am working on integrating single-cell multiome data (ATAC + Gene Expression) from multiple samples and would like to compute a joint embedding using snap.tl.multi_spectral. However, I'm facing challenges with batch correction.

I have performed batch correction for both ATAC and Gene Expression data following the tutorials (https://kzhang.org/SnapATAC2/tutorials/integration.html). Despite this, the batch effect reappears after calculating the joint embedding with snap.tl.multi_spectral([rna, atac], features=None)[1].

Is there a way to utilize specific .obsm values, such as "X_spectral_mnn", in the snap.tl.multi_spectral function to maintain the batch correction effect?

My goal is to achieve a common UMAP and clustering for ATAC and Gene Expression data similar to Seurat's Weighted Nearest Neighbor (WNN) Analysis, understanding that SnapATAC2 uses spectral analysis rather than PCA.

Thank you in advance for your assistance.

kaizhang commented 9 hours ago

You should apply batch correction to the joint embedding created by multi_spectral. multi_spectral directly works on the raw count matrices, which are not batch corrected.