openproblems-bio / openproblems

Formalizing and benchmarking open problems in single-cell genomics
MIT License
320 stars 80 forks source link

Transcription factor activity estimation from scRNA-seq task #315

Closed PauBadiaM closed 1 month ago

PauBadiaM commented 3 years ago

Hi everyone! I'm Pau, a PhD student from saezlab. We are mostly interested in extracting mechanistic insights from (single-cell) omics data and we would like to propose an OpenProblem around the estimation of Transcription Factor Activities from scRNA-seq. Here's a written a summary, feedback and contributions appreciated, thanks!

Transcription factor activity estimation from scRNA-seq

Transcription Factors (TFs) are key regulators of cell identity and fate. Hence, estimating their activities from scRNA can provide mechanistic insights on many, if not most, scRNA studies. In addition, TF-activity estimates can be used to summarize gene coordination events into a small and interpretable set of features.

Downstream transcriptional targets of a TF yield a much more robust estimation of the TF activity than just observing the expression of the TF itself [1,2,3]. A TF-activity method (TFAM) requires thus a gene regulatory network (GRN) in combination with a statistical algorithm to summarize the expression of the target genes into a single activity score. There are multiple TFAMs for bulk-RNA, and some specific for scRNA data such as metaVIPER [4] or SCENIC [5], combining diverse GRNs and statistical methods. In previous work we have benchmarked both bulk- and scRNA-specific methods on scRNA-seq and found that they seem to be robust to drop-outs and other features of scRNA data [2], and that there are important differences across methods on various in silico and real data benchmarks.

While these results were already informative, we believe that a more systematic and comprehensive analysis is needed to truly determine the quality of the predicted TF activities in different contexts. In particular, we want to include recently developed methods not available in our first benchmark, systematically test combinations of GRNs and statistics, and test the methods in more contexts. For this we suggest to leverage DecoupleR a package to benchmark TFAM methods originally developed for bulk-RNA (as explained here).

There are two key challenges: (i) which is the GRN that better recapitulates TF activities and (ii) determine the best algorithm. To address these challenge, we propose the following components:

Datasets

Expression data:

Gene regulatory networks:

Methods

For each data-set TF activities will be computed using all possible combinations of gene regulatory networks and algorithms. The vast majority of methods are already implemented in DecoupleR.

Metrics

Bibliography

  1. Dugourd, A. & Saez-Rodriguez, J. Footprint-based functional analysis of multiomic data. Current Opinion in Systems Biology 15, 82–90 (2019).
  2. Holland, C. H. et al. Robustness and applicability of transcription factor and pathway analysis tools on single-cell RNA-seq data. Genome Biol. 21, 36 (2020).
  3. Alvarez, M. J. et al. Functional characterization of somatic mutations in cancer using network-based inference of protein activity. Nat. Genet. 48, 838–847 (2016).
  4. Ding, H. et al. Quantitative assessment of protein activity in orphan tissues and single cells using the metaVIPER algorithm. Nat. Commun. 9, 1471 (2018).
  5. Aibar, S. et al. SCENIC: single-cell regulatory network inference and clustering. Nat. Methods 14, 1083–1086 (2017).
  6. Dixit, A. et al. Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens. Cell 167, 1853-1866.e17 (2016).
  7. Genga, R. M. J. et al. Single-Cell RNA-Sequencing-Based CRISPRi Screening Resolves Molecular Drivers of Early Human Endoderm Development. Cell Rep. 27, 708-718.e10 (2019).
  8. Teschendorff, A. E. & Wang, N. Improved detection of tumor suppressor events in single-cell RNA-Seq data. BioRxiv (2020) doi:10.1101/2020.07.04.187781.
  9. Garcia-Alonso, L., Holland, C. H., Ibrahim, M. M., Turei, D. & Saez-Rodriguez, J. Benchmark and integration of resources for the estimation of human transcription factor activities. Genome Res. 29, 1363–1375 (2019).
  10. Keenan, A. B. et al. ChEA3: transcription factor enrichment analysis by orthogonal omics integration. Nucleic Acids Res. 47, W212–W224 (2019).
  11. Liu, Z.-P., Wu, C., Miao, H. & Wu, H. RegNetwork: an integrated database of transcriptional and post-transcriptional regulatory networks in human and mouse. Database (Oxford) 2015, (2015).
github-actions[bot] commented 1 month ago

This issue has been automatically closed because it has not had recent activity.