pinellolab / dictys

Context specific and dynamic gene regulatory network reconstruction and analysis
GNU Affero General Public License v3.0
101 stars 13 forks source link

Doubt about a claim made in the paper #17

Closed cakeinspace closed 1 year ago

cakeinspace commented 1 year ago

Hey thanks for the package. I was reading the dictys paper on bioarxiv and it says

To reconstruct a context specific GRN for each group of cells, Dictys first infers TF binding sites in regulatory regions (i.e. promoters and enhancers) from TF footprints in pseudo-bulk or bulk chromatin accessibility data (Fig. 1ab, Methods). TF footprints are much shorter regions compared to chromatin accessibility peaks and can mitigate false positive binding sites.

Dont Cell Oracle and SCENIC do this as well. They identify the promoter and enhancer regions from scATAC-seq data and use motif scanning tools to find transcription factor binding sites to predict the number of TF binding sites in the regulatory regions and then use regression models to fit the gene expression.

In the paper it says that dictys can infer context specific GRNs. The line I have an issue with is - In addition, neither SCENIC nor CellOracle accounts for cell-type specific TF binding in distal regulatory elements such as enhancers

My understanding is that a motif scanning algorithm would give you the TF binding sites in these regions as well. Am I mistaken just a small disclaimer I am not affiliated with either Scenic or Cell Oracle and am just doing a literature review

lingfeiwang commented 1 year ago

Hi cakeinspace,

Thank you for your questions and happy to discuss.

SCENIC and CellOracle do not perform TF footprinting or seek cell-type specific TF binding. SCENIC uses TF binding database as input. CellOracle identifies peaks (not footprints) from population-level pseudo-bulk reads of scATAC-seq. Therefore, neither of them finds cell-type specific TF binding. In contrary, Dictys uses pseudo-bulk reads from the selected cell subset, either by cell type or by moving window in dynamic networks.

Hope that answers your questions and happy to clarify further.

Lingfei

cakeinspace commented 1 year ago

CellOracle uses the regulatory region’s genomic DNA sequence and TF-binding motifs for this task. CellOracle identifies regulatory candidate genes by scanning for TF-binding motifs within the regulatory DNA sequences (promoter and enhancers) of open chromatin sites.

Base GRN assembly can be divided into two steps: (i) identification of promoter and enhancer regions using scATAC-seq data; and (ii) motif scanning of promoter and enhancer DNA sequences.

this is from the cell oracle paper. It seems like they do this step they identify promoters and enhancers using homer and cicero and then they identify sites using gimme motifs

They also provide the dataset when we dont supply the atac seq dataset. But even in case of dictys the tfbs relies on the presence of atac seq dataset. What happens when we supply only rna seq

lingfeiwang commented 1 year ago

Hi cakeinspace,

It appears we are talking about different things. CellOracle does not seem to use TF footprinting/DNA footprinting. See https://en.wikipedia.org/wiki/DNA_footprinting and the references in Dictys paper.

Regarding your new question, Dictys uses single-cell multi-omics data to infer GRNs. The application scenario is different from what you mentioned.

Lingfei

cakeinspace commented 1 year ago

Hey thanks for your patience. I went over the wellington paper and from what i understand . It identifies TF footprints by

  1. strand specific enrichment of cut sites to define an appropriate search region
  2. statistical testing of number of cut sites generated within those regions along with heuristic filtering such as greedy selection according to p-value and bp overlap.

This enables you to identify binding sites which have lower number of false positives and are more correct. So the advantage over Cell Oracle which uses ATAC-seq peaks would be that the number of false positives in binding sites motif discovery is reduced as we are considering a smaller and more accurate region for motif discovery.

Is this what you mean by context specific GRN construction???. Once again thanks a lot for answering my doubts

lingfeiwang commented 1 year ago

Great we are reaching some consensus.

Context specific or cell-type specific GRN reconstruction is something different. For that we need to use cell-type specific data, such as data of cells from one particular cell type. Dictys only needs cells of the given cell type for GRN reconstruction. However, CellOracle needs data of all cells or cell types for chromatin accessibility processing. We believe using an external data base/file counts as cell-type specific only if that data is derived from and for the particular cell type. Therefore SCENIC does not have cell-type specific TF binding information either.

cakeinspace commented 1 year ago

But then Cell Oracle can do cell type specific by creating pseudo-bulk ATAC-seq reads for a given cell type from the single cell ATAC seq data to identify promoters and enhancer regions. I guess in dictys from what I understand when we have the DNase seq data for a given cell type we use that for identifying TF footprints and when we don't we simply use the scATAC seq data. Sorry for bugging you about this point. I promise this is the last point at which I am stuck. I havent gone over the SCENIC paper yet, thats why most of my questions are related to Cell Oracle.

lingfeiwang commented 1 year ago

That's totally fine.

Regarding CellOracle, here I always mean its software, paper, and tutorial as cited in our manuscript. As a user, you may repurpose it for anything you want but such infinite possibilities are untested and beyond our discussion. For example, do they create pseudo-bulk ATAC-seq reads for a given cell type in their tutorial?

For Dictys, we use cells of a particular cell type from scATAC-seq.

cakeinspace commented 1 year ago

Thats a valid point. Thanks a lot for all your clarifications. You have been incredibly helpful.

Warm Regards Anurag