features, gene activity, promoter and peaks

wangjiawen2013 commented 1 year ago

Hi, Various matrices are used in snapatac2's tutorial:

Features: creating a cell by bin matrix containing insertion counts across genome-wide 500-bp bins. gene activity: generates cell by gene activity matrix by counting the TN5 insertions in gene body regions. peaks: An important goal of single-cell ATAC-seq analysis is to identify genomic regions that are enriched with TN5 insertions, or "open chromatin (also called euchromatin)" regions. snap.tl.call_peaks first calls peaks for individual groups and then merges overlapping peaks to create a list of fix-width non-overlapping peaks.

These are different from the pipeline of cellranger-atac and signac. Cellranger-atac call peaks first and then perform dimension reducton and clustering. Signac usng cell-peaks matrix for downstream analysis. In my opioion, the promoter regions plays a import role in regulating gene expression, while snapatac2 doesn' use it. So what's the difference of these matrix and which one should I use for downstream analysis ?

sunshx-bioinfo commented 1 year ago

Hi,

It's difficult to infer cell information, such as gene expression and promoter regions, with sparse scATACseq data, so different matrices are generated for various aims. I also want to know how to choose suitable matrix from these three matrices for different aims in downstream analysis.

Thanks!

kaizhang commented 1 year ago

For dimension reduction and clustering analysis, bin/tile matrices usually works the best, as these matrices contain both promoter and distal enhancer signals. The problem with using peak matrix for clustering analysis is that you need cell type labels to call peaks in each cell type. If you call peaks using pseudo-bulk, then you are going to miss peaks from rare cell types.

Gene matrices are usually used for cell type annotation and integration with scRNA-seq data.

kaizhang / SnapATAC2

features, gene activity, promoter and peaks #164