lhe17 / nebula

GNU General Public License v2.0
28 stars 6 forks source link

Can NEBULA be applied to DE for single cell ATAC sequencing data? #11

Closed MANZHAOHUI closed 1 year ago

MANZHAOHUI commented 1 year ago

Thank you.

lhe17 commented 1 year ago

Hi Man,

Thank you for your good question.

I have not done benchmarking for scATAC-seq data using NEBULA although I think the application to scATAC-seq data is quite straightforward. However, some adjustment is needed before replacing the gene count matrix with a peak count matrix.

Here are my suggestions and I hope they help. The count matrix from cellranger might not be appropriate for NEBULA because it outputs read counts instead of fragment counts. Since a fragment has a read in both ends and both are likely included in a peak, consequently, the cellranger read count matrix has a large number for even counts. So, some adjustment is needed before feeding the matrix to NEBULA.

A simple approach is rounding odd counts to the next larger even counts and dividing the matrix by 2 as described in a recent manuscript ( https://www.biorxiv.org/content/10.1101/2022.05.04.490536v1). Or, as a more accurate way, you could try other packages to call the fragment count matrix, e.g., the function FeatureMatrix in Signac, ( https://stuartlab.org/signac/articles/pbmc_vignette.html).

This recent manuscript ( https://www.biorxiv.org/content/10.1101/2022.05.04.490536v1) also shows that the fragment counts follow a Poisson distribution. So, you could also try the Poisson mixed model in NEBULA, which is much faster than the NBGMM, if you analyze cells within each cell type. One potential problem I can imagine is low statistical power for the vast majority of peaks because scATAC-seq data has much more features and is thus more sparse for each peak than scRNA-seq data.

Best regards, Liang

On Thu, Dec 8, 2022 at 4:39 PM ZHAOHUI MAN @.***> wrote:

Thank you.

— Reply to this email directly, view it on GitHub https://github.com/lhe17/nebula/issues/11, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGDISUTGDEVA2CLDR5STG4TWMH6MVANCNFSM6AAAAAASYJDP2Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>

MANZHAOHUI commented 1 year ago

Hi Liang, Thank you for the thorough explanation. FYI, I've tried NEBULA HL for my single cell RNAseq data in DE analysis, and it churned out results that make much better sense than MAST in terms of GO and pathway analysis. I will definitely try it against the ATAC data following your advice. Best regards. Zhaohui Man