WGBS; ATAC-seq; ChIP-seq; data from same sample source

Chao-Guo-hub commented 1 week ago

This is exactly the tool I have been dreaming of. However, I still have a few questions to ask:

Is it suitable for WGBS data? Based on the same input format, can I use the matrix of ATAC-seq, ChIP-seq, or even Hi-C peaks as input for prediction? Have you tried this, and how accurate is it? If I use scRNA-seq, bulk RNA-seq, bulk WGBS, and bulk ATAC-seq from the same sample source, will the accuracy be higher?

yuabrahamliu commented 5 days ago

This is exactly the tool I have been dreaming of. However, I still have a few questions to ask:

Is it suitable for WGBS data? Based on the same input format, can I use the matrix of ATAC-seq, ChIP-seq, or even Hi-C peaks as input for prediction? Have you tried this, and how accurate is it? If I use scRNA-seq, bulk RNA-seq, bulk WGBS, and bulk ATAC-seq from the same sample source, will the accuracy be higher?

Sorry for this late reply.

1) It can be used on WGBS data.

2) Theoretically, you can deconvolve these non-RNA data types with scRNA-seq as a reference and a bulk RNA-non-RNA pair as mediation. However, I have only tried bulk RNA-methylation and bulk RNA-ATAC pairs and have no experience with other data types.

3) The accuracy can be largely influenced by the cell types you want to deconvolve. If they are very similar cell types, the accuracy will be relatively low; if they are more different, the accuracy will be high.

Chao-Guo-hub commented 5 days ago

Thank you very much for your answer. As for the third answer, I think my understanding is that it would be easier for subsequent bulk data deconvolution if we only focused on the basic cell classification that means the biggest difference in scRNA-seq annotation and did not make subcellular annotation.

I also sent an email to your Google Mail with the same question. Perhaps more details about our own data can be further explored via email

yuabrahamliu commented 4 days ago

Thank you very much for your answer. As for the third answer, I think my understanding is that it would be easier for subsequent bulk data deconvolution if we only focused on the basic cell classification that means the biggest difference in scRNA-seq annotation and did not make subcellular annotation.

I also sent an email to your Google Mail with the same question. Perhaps more details about our own data can be further explored via email

For example, if you want to deconvolve B cells and T cells, the accuracy will be high, but if you wish to deconvolve CD4 and CD8 T cells, the accuracy will be low. In a simulated experiment, the former can achieve a PCC with true values > 0.9 with scDeconv; the latter can only get a PCC > 0.7. Not only for scDeconv, but 0.7 is also the ceiling for all the single-omic bulk RNA deconvolution methods for CD4 and CD8 T cells, at least from my experience.

I didn't receive your email. You could write your question here. Thank you.

Chao-Guo-hub commented 4 days ago

Understood, and thanks again for your reply!

Regarding more detailed matters, I recently reviewed all the methodological papers on deconvolution of bulk multi-omics data (both reference-based and reference-free), and a few particularly impressed me:

RNA-seq: Represented by CIBERSORTx (https://cibersortx.stanford.edu/)
DNA methylation: Your scDeconv and EPISCORE mentioned in your article (https://www.nature.com/articles/s41592-022-01412-7)
ATAC-seq: Cellformer (https://www.nature.com/articles/s41467-023-40611-4); DC3 (https://www.nature.com/articles/s41467-019-12547-1); DeconPeaker (https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2020.00392/full)
HiC: DeCOOC (https://onlinelibrary.wiley.com/doi/10.1002/advs.202301058); scGrapHiC (https://academic.oup.com/bioinformatics/article/40/Supplement_1/i490/7700876?login=true#475454788)
ChIP-seq: The only non-peer-reviewed work, CHAS (https://www.biorxiv.org/content/10.1101/2021.09.06.459142v2.full)

Although most of these methods claim to deconvolve cell components from their respective datasets, in reality, given the current stage where single-cell epigenomics data is not yet widespread but scRNA-seq costs are dropping rapidly, relying on scRNA-seq to infer cellular composition while continuing to use other bulk epigenomic data for regulatory analysis with sufficient resolution seems to be a viable approach. Moreover, being able to deconvolve epigenomic data into known cell types and locate regulatory changes at the cellular level has significant implications for both medicine and agriculture. I noticed that scDeconv, with its use of paired information, has great potential to develop into a framework for this kind of research, which excites me. The only difference from what you are considering is that the scRNA-seq reference dataset I’m using is from the same source as the bulk dataset. Would this turn the task of numerical regression at the single-cell level into a problem of numerical solution in the bulk data from known cell types and proportions ? (I’m not sure if my description is correct as I’m not a computational scientist.)

Multi-omics data is sweeping through biological research, and scRNA-seq is becoming a routine method. We are currently working on such projects. They encompass the most comprehensive multi-omics datasets from specific biological samples (scRNA-seq, as mentioned above, and more), and I now aim to push the resolution of this analysis to the single-cell level. I’m not sure if you would be interested in collaborating on such a project as a collaborator, while continuing to develop your scDeconv tool?

I’d love to have more detailed conversations via email or instant messaging!

A brief introduction: My name is Chao Guo (郭超), a PhD candidate at the University of Science and Technology of China. I’m currently studying epigenetics in Professor Ya-ping Zhang’s(张亚平) team at the Kunming Institute of Zoology through a joint training program. My email addresses are guochao0403@outlook.com; gc0403@mail.ustc.edu.cn (perhaps previous emails were blocked). My phone & WeChat: 19855100340.

yuabrahamliu / scDeconv

WGBS; ATAC-seq; ChIP-seq; data from same sample source #1