nrnb / GoogleSummerOfCode

Main documentation site for NRNB GSoC project ideas and resources
114 stars 38 forks source link

Integration of GRN inference algorithms using single cell multi-omics data to BEELINE #240

Closed yiqisu closed 2 months ago

yiqisu commented 3 months ago

Background

Gene regulatory networks (GRNs), which delineate the interactions between transcription factors (TFs) and their target genes, represent the mechanisms governing cell differentiation and disease development. Our benchmarking framework, BEELINE [1], rigorously assessed a dozen unsupervised algorithms for GRN inference in early 2020. Since then, with the rapid advances in single-cell sequencing technologies, a plethora of algorithms for GRN inference have emerged over the past four years. To accommodate newly released algorithms and datasets, we plan to introduce an enhanced framework, BEELINE 2.0, aimed at offering users valuable recommendations for the selection or development of GRN inference algorithms.

Reference: [1] Pratapa, A., Jalihal, A.P., Law, J.N. et al. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nat Methods 17, 147–154 (2020). https://doi.org/10.1038/s41592-019-0690-6

Goal

The objective of this project is to extend the BEELINE pipeline by incorporating novel GRN inference algorithms, particularly those developed for using single-cell multi-omics data (scRNA-seq and scATAC-seq data). We are interested in evaluating both the performance of new GRN inference algorithms as well as how the integration of two types of information contributes to GRN inference. The key steps in the project will include:

  1. Read the BEELINE paper to understand the benchmark.

  2. Install BEELINE or clone the repository and run it to replicate results in the paper.

  3. Integrate two or all three GRN inference algorithms from the following list to BEELINE.

  4. Integrate these three selected algorithms to BEELINE extension pipeline with the following steps:

  5. The mentor will provide appropriate scRNA-seq and scATAC-seq datasets for evaluation.

  6. Evaluate the performance of added algorithms with provided metrics and datasets.

Difficulty Level: Medium

Prospective contributors should possess or be open to learning the BEELINE framework, gene regulatory networks inference, and single cell multi-omics data analysis.

Size and Length of Project

Skills

Essential skills: Python, R, Docker Nice to have skills: Pytorch, Linux, knowledge of single-cell transcriptomics, single-cell chromatin accessibility, and gene regulatory networks

Public Repository

BEELINE https://github.com/Murali-group/Beeline

Potential Mentors

Yiqi Su (yiqisu@vt.edu)

KarthikDani commented 3 months ago

Hello @yiqisu @khanspers!

I find this project quite interesting and different. I'm going to be on my internship at Indian Institute of Science in the Computational and Experimental Biology. I'd like to work on this project but I am worried if I can carry out the work in Mac OS (I've skimmed through the papers and found the requirement of Ubuntu 18)!

Please let me know what do you think. Any suggestions are welcome!

yiqisu commented 3 months ago

Hi @KarthikDani,

Thanks for your interest in this project. There could be a potential issue with the Apple M1 chip, otherwise macOS should be okay. Nonetheless, our objective is to ensure compatibility with various systems.

Please let me know if you have any further questions.

Best, Yiqi

khanspers commented 2 months ago

This is an active GSoC 2024 project. Closing this project idea as it is no longer available to other contributors.