stuart-lab / signac

R toolkit for the analysis of single-cell chromatin data
https://stuartlab.org/signac/
Other
316 stars 85 forks source link

Re:Joint scATAC-seq and scRNA-seq analysis #247

Closed sylestiel closed 3 years ago

sylestiel commented 3 years ago

Hi Tim,

load processed data matrices for each assay

rna <- Read10X("/home/stuartt/data/snare-seq/GSE126074_AdBrainCortex_rna/", gene.column = 1)

For the above line what file would I read in if I have Cell Ranger Output? This is with respect to integration of ATAC and RNA seq data at single cell.

Also under what circumstances do we include the first dimension. At times the script dispenses with the first. So is there a norm that we should follow?

Thanks!

timoast commented 3 years ago

For the above line what file would I read in if I have Cell Ranger Output? This is with respect to integration of ATAC and RNA seq data at single cell.

For all analyses we need a count matrix. There are several ways to obtain this, either from cellranger, another pre-processing pipeline, or within Signac itself via the FeatureMatrix() function. If you have processed RNA or ATAC data using cellranger or cellranger-atac I recommend reading in the .h5 file using the Seurat::Read10X_h5() function as shown in the vignettes. If you have a sparse matrix in matrix market format, you can read it into R using Seurat::Read10X().

Also under what circumstances do we include the first dimension. At times the script dispenses with the first. So is there a norm that we should follow?

In most cases for scATAC-seq we exclude the first LSI component as it's typically highly correlated with sequencing depth. You can assess this correlation using the DepthCor() function (as shown in the vignettes). For scRNA-seq data we don't typically observe such a strong relationship between the first PC and sequencing depth, and so usually retain the first PC in downstream analyses.

mreza-ef commented 1 year ago

Please correct me if I am wrong, but for PCA and LSI, we use scaled data that is already normalized for depth. However, DepthCor() uses unnormalized count data, which always shows the large correlation on the plot. Should I use the DepthCor() plot to choose the right dimensions for umap and clustering?

Thanks.