Supervisor: Bérénice Batut
For degree: Master project
Status: Completed
Keywords: Metagenomics, Galaxy, benchmarking

Global context

Metagenomics analysis relies on sophisticated computational approaches: assembly, binning, taxonomic classification, etc. Any downstream analyses (comparative, etc) are only meaningful if the outcome of these initial data processing methods makes sense. Despite the tremendous progress in the last years, none of these approaches can completely recover the complex information encoded in metagenomes. They all rely on simplifying assumptions that can lead to strong limitations.

When presenting novel or improved methods, the accuracy of computational methods in metagenomics is often evaluated. But usually, these evaluations are hardly comparable: no general standard for the assessment of computational methods in metagenomics. This may result in users not well informed and misinterpretations of computational predictions.

To tackle this problem. the initiative for the "Critical Assessment of Metagenome Interpretation" (CAMI) was founded in 2014. It evaluates methods in metagenomics independently, comprehensively and without bias. The initiative supplies users with exhaustive quantitative data about the performance of methods in all relevant scenarios. It therefore guides users in the selection and application of methods and in their proper interpretation. Furthermore it provides valuable information to developers, allowing them to identify promising directions for their future work.

Project context

The 2nd CAMI offers several challenges: an assembly, a genome binning, a taxonomic binning and a taxonomic profiling challenge, on several multi-sample data sets from different environments, including long and short read data. Participants registered for download of the challenge datasets. They ran different tools, with different parameters on the different datasets. For reproducibility, participants could submit either a Docker container containing the complete workflow, a bioconda script or a software repository with detailed installation instructions, specifying all parameter settings and reference databases used. Altogether 5,002 submissions of 76 programs were received for the four challenge datasets, from 30 external teams and CAMI developers. The CAMI developers evaluated then the results using standardized metrics and then make sense from the different results Meyer et al, 2020

In this project, we would to show that Galaxy could be used as a platform to support the next CAMI challenges:

For team participating in the challenge
- Galaxy could be connected to the benchmarking datasets
- Galaxy could provide the computational resources to teams that may not have access to some
- Tools and databaes could be available in Galaxy
- Galaxy workflows could be shared via IWC / Dockstore for the reproducibility
- Data could be formatted and submitted directly CAMI benchmarking portal via Galaxy
For CAMI developers and tool developers
- Workflows for result evaluation could be available in Galaxy

Objectives of the project

Run the different challenges with different tools in Galaxy (some tools may need to be added in Galaxy)
Make the input data and results available in Galaxy
Share the workflows via IWC

Proposed agenda for the project

Read CAMI 2 paper: Meyer et al, 2020
List the tools for the different challenges and check if they are available in Galaxy (and which version)
Get familiar with tool integration in Galaxy
Select one of the challenge (assembly, profiling, genome binning, taxon binning, clinical pathogen detection) and run it
1. Add the "winning" tools in Galaxy
2. Add the input data in Galaxy
3. Run the different tools on the data and try to identify the best set of parameters for each tool/version
4. Compare the results to the ones in CAMI
5. Share the workflows via IWC
Run similarly other challenges

Prerequisites

Metagenomics: concepts
Version control: concepts
Github Workflow: concepts
Galaxy: concepts

usegalaxy-eu / project-ideas

Reproducing CAMI (metagenomics) challenges in Galaxy #24

Global context

Project context

Objectives of the project

Proposed agenda for the project

Prerequisites

Further reading and useful links