usegalaxy-eu / project-ideas

A collection of project ideas suitable for Master and Bachelor students
MIT License
9 stars 2 forks source link

Port a pipeline for microbiome data analysis to Galaxy #31

Open bebatut opened 2 years ago

bebatut commented 2 years ago

Supervisor: Bérénice Batut For degree: Master project Status: Open Keywords: Galaxy, Tool, Workflow, Metagenomics

Global Biological/Research context

Microbiome is the collection of all microbes, such as bacteria, fungi, viruses, along with their genes, which live inside and outside our bodies in all environments surrounding us [1]. To investigate microbiomes, researchers use sequencing data and microbiome analyses [2] . These analyses rely uses sequencing data to investigate microbiomes. Such analysis relies on sophisticated computational approaches: assembly, binning, taxonomic classification, functional profiling etc. Analysing microbiome data makes it possible to answer the two main questions for most microbiome analysis

These analyses rely on bioinformatics tools and also databases [3,4]. Few workflows [5,6,7] to process this data are available and most are not openly available, not transparent or not easy to use by researchers. To tackle this problem, the Freiburg Galaxy team together with the microGalaxy community use Galaxy [8] to build workflows to analyse microbiome sequencing data.

Project context

MGnify offers an automated pipeline for the analysis and archiving of microbiome data to help determine the taxonomic diversity and functional & metabolic potential of environmental samples.

The pipeline even if documented is not really usable outside their resources. We would like to offer this pipeline for Galaxy users.

Objectives of the project

Proposed agenda for the project

  1. Get familiar with the Mgnify pipeline
  2. Run the Mgnify pipeline
  3. Identify the steps and tools in the pipeline
  4. Integrate the missing tools in Galaxy using Planemo
  5. Build in Galaxy the pipeline connecting the tools
  6. Benchmark the Galaxy pipeline against the original one
  7. Annotate and submit the pipeline to Galaxy's Intergalactic Workflow Commission

Prerequisites

Further reading

Mgnify

Galaxy

References

[1] Martin J. Blaser. “The microbiome revolution” The Journal of Clinical Investigation (2014): 124. [2] Sharpton, Thomas J. "An introduction to the analysis of shotgun metagenomic data." Fontiers in plant science 5 (2014): 209. [3] Oulas, Anastasis, et al. "Metagenomics: tools and insights for analyzing next-generation sequencing data derived from biodiversity studies." Bioinformatics and biology insights 9 (2015): BBI-S12462. [4] Escobar-Zepeda, Alejandra, Arturo Vera-Ponce de León, and Alejandro Sanchez-Flores. "The road to metagenomics: from microbiology to DNA sequencing technologies and bioinformatics." Frontiers in genetics 6 (2015): 348. [5] Mehta, Subina, et al. "ASaiM-MT: a validated and optimized ASaiM workflow for metatranscriptomics analysis within Galaxy framework." F1000Research 10 (2021). [6] Mitchell AL, et al. “MGnify: the microbiome analysis resource in 2020” Nucleic Acids Research (2019), doi:10.1093/nar/gkz1035. [7] Wooley, John C., Adam Godzik, and Iddo Friedberg. "A primer on metagenomics." PLoS computational biology 6.2 (2010): e1000667. [8] Enis Afgan, et al. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update, Nucleic Acids Research, Volume 46, Issue W1, 2 July 2018, Pages W537–W544, doi:10.1093/nar/gky379