usegalaxy-eu / project-ideas

A collection of project ideas suitable for Master and Bachelor students
MIT License
9 stars 2 forks source link

Evaluation of a workflow to build genomes from metagenomic data #33

Open bebatut opened 1 year ago

bebatut commented 1 year ago

Supervisor: Bérénice Batut For degree: Master Status: Open Keywords: Microbiome, Metagenomics, Galaxy, Assembly, Workflow, Benchmarking

Global Biological/Research context

Microbiome is the collection of all microbes, such as bacteria, fungi, viruses, along with their genes, which live inside and outside our bodies in all environments surrounding us [1]. To investigate microbiomes, researchers use sequencing data and microbiome analyses [2] . These analyses rely uses sequencing data to investigate microbiomes. Such analysis relies on sophisticated computational approaches: assembly, binning, taxonomic classification, functional profiling etc. Analysing microbiome data makes it possible to answer two main questions for most microbiome analysis

Microbiome sequencing data gives also the possibility to assembly genomes of organisms that can not be cultivated invidually (e.g. [3,4]). However, building genomes out of metagenomics data (called Metagenome Assembled Genomes or MAGs) is complex given the mix of sequences from many organisms, requires many steps [5,6] and high computational resources.

Few workflows to build MAGs this data are available (e.g. [7,8]) and most are not openly available, not transparent or not easy to use by researchers.

Project context

Ihe Freiburg Galaxy team together with the microGalaxy community use Galaxy [9] to build a MAGs building workflow, that will be open, transparent, reusable, accessible.

This workflow has been developed with data from the cloud environment. Now we would like to adapt this workflows on data from other microbiome environments, evaluate it using benchmarking data, compare it against other workflows, document and share the workflow.

Objectives of the project

Proposed agenda for the project

  1. Bibliography of metagenomic assembly, MAGs building, existing worklows
  2. Get familiar with the implemented MAGs building workflow
    1. Create the skeleton of a tutorial explaining each step and selected parameters
  3. Evaluate the results of the workflow on the cloud data
    1. Aggregate and analyze the different generated quality metrics into a Jupyter notebook
    2. Run extra steps to evaluate the quality of created MAGs
  4. Benchmark the workflow on the CAMI challenge benchmarking data
    1. Run the workflow on the different datasets from the CAMI challenge
    2. Evaluate the results
  5. Share the workflow
    1. Annotate the workflow
    2. Update the dedicated tutorial
    3. Submit the workflow to IWC

Prerequisites

Further reading

Galaxy

References

[1] Martin J. Blaser. “The microbiome revolution” The Journal of Clinical Investigation (2014): 124. [2] Sharpton, Thomas J. "An introduction to the analysis of shotgun metagenomic data." Fontiers in plant science 5 (2014): 209. [3] Xie, Fei, et al. "An integrated gene catalog and over 10,000 metagenome-assembled genomes from the gastrointestinal microbiome of ruminants." Microbiome 9.1 (2021): 1-20 [4] Nishimura, Yosuke, and Susumu Yoshizawa. "The OceanDNA MAG catalog contains over 50,000 prokaryotic genomes originated from various marine environments." Scientific Data 9.1 (2022): 1-11. [5] Chen LX, Anantharaman K, Shaiber A, Eren AM, Banfield JF (2020) Accurate and complete genomes from metagenomes. Genome Res 30(3):315–333 [6] Sieber CMK, Probst AJ, Sharrar A, Thomas BC, Hess M, Tringe SG, et al. Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat Microbiol. 2018;3(7):836–43 [7] Kieser, Silas, et al. "ATLAS: a Snakemake workflow for assembly, annotation, and genomic binning of metagenome sequence data." BMC bioinformatics 21.1 (2020): 1-8. [8] Raguideau, Sebastien, et al. "Novel microbial syntrophies identified by longitudinal metagenomics." bioRxiv (2021). [9] Enis Afgan, et al. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update, Nucleic Acids Research, Volume 46, Issue W1, 2 July 2018, Pages W537–W544, doi:10.1093/nar/gky379 [10] Meyer, Fernando, et al. "Critical Assessment of Metagenome Interpretation: the second round of challenges." Nature methods 19.4 (2022): 429-440.