nrnb / GoogleSummerOfCode

Main documentation site for NRNB GSoC project ideas and resources
114 stars 38 forks source link

INTEGRATE4Galaxy #237

Closed marcoxa closed 2 months ago

marcoxa commented 5 months ago

Background

Technological advancements in the determination of complete genomic sequences and their annotation, and the well-explored biochemistry of metabolic transformations have facilitated the thorough reconstruction, on a genome-scale, of the metabolic network of many target organisms. The conversion of metabolic networks into a mathematical representation has established a framework upon which a mechanistic understanding of the metabolic genotype–phenotype relationship can be articulated. While a metabolic reconstruction is unique to the target organism, one can derive many different condition-specific models from a single reconstruction. Mapping of ‘omics’ into these networks enables the analysis of the ‘omics’ in the context of the curated knowledge about the target organism.

The DCB and the SYSBIO groups at the Universita` degli Studi di Milano-Bicocca have constructed a series of algorithms and tools to study the Metabolic Rewiring occurring in many physio-pathological conditions, such as cancer, or embryo development. The algorithms seek to predict metabolic flux distributions from post-genomics data, including proteomics, transcriptomics, and metabolomics. The main tools embodying such studies leverage the cobrapy library to perform flux sampling and/or flux optimization to highlight the different behaviors of diseased cells w.r.t. normal cells. The "Metabolic Reaction Enrichment Analysis" (MaREA4Galaxy) tool computes a score for each relevant reaction and uses it to provide direct feedback to the user about how a reaction is "up-regulated" or "down-regulated", according to a given set of external constraints. At the time of this writing, the MAREA tool produces a static, highlighted version of the differentially enriched pathways using a set of Matlab and Python programs.

Goal

The goal of the the project is to design a Galaxy tool, that goes beyond the simple data mapping enabled by MaREA4Galaxy, and allows non-expert users to perform constraint-based enrichment analysis of metabolic pathways through the integration of various data types, including gene expression, proteomics, and metabolomics. This could be a self-contained GSoC project for a motivated programmer. The main goal will be to read in the original metabolic network (which is usually a "reduced" one, w.r.t. the cell's full-blown metabolic network).

Difficulty Level: Medium

The difficulty of the project is twofold: first of all the developer will have to fully understand the main Python-based metabolic reaction enrichment pipelines developed by the group. Secondly, the developer will have to design a Galaxy tool that will ease their use. The tool will take as input either a single dataset or heterogeneous post-genomics datasets and will provide an integrated view of metabolic alterations. Results will be delivered both in an easy-to-understand graphical fashion and in file formats complaint to other data analysis pipelines.

Size and Length of Project

Size

The project size is medium: 175 hours.

Length

Timeline is flexible and the project is 12 weeks

Skills

Essential skills: Python Nice to have skills: knowledge of Galaxy platform end familiarity with constraint-based modeling.

Public Repository

The current repository is [MAREA4Galaxy](https://toolshed.g2.bx.psu.edu/repository?repository_id=a4e912ba1e9608a8 "MAREA Galaxy repository link")

Potential Mentors

RanitMukherjee commented 5 months ago

Hello, I am Ranit Mukherjee, currently pursuing my BTech in Computer Science. I am particularly interested in this project and would like to contribute to the same.

I have prior experience with Python and am getting familiar with Galaxy platform. Also, just started exploring MAREA4Galaxy tool.

Could you please direct me on how should I go about for understanding the metabolic reaction enrichment pipelines? Thanks.

harshagr70 commented 4 months ago

hey , @marcoxa , i would like to work on this project , i have prior experience of enrichment analysis plotting and tools using R , really excited to implement the same using python , i am also familiar with constrain-based modelling . looking forward to work on this project. thanks .

khanspers commented 4 months ago

NRNB has been accepted as a mentoring organization for GSoC 2024. The contributor application period is March 18 – April 2. Here are some useful links:

GSoC contributor guide NRNB project proposal template Eligibility requirements Full program timeline

harshikagoyal14 commented 3 months ago

Hey @alexgraudenzi , i am interested in this projected and wanted to discuss things in detail. Is there any discord or slack channel?

khanspers commented 3 months ago

Hi @harshikagoyal14, we don't have a Slack or discord for NRNB projects. Please contact mentors by email.

khanspers commented 2 months ago

This is an active GSoC 2024 project. Closing this project idea as it is no longer available to other contributors.