Metagenomics analysis relies on sophisticated computational approaches: assembly, binning, taxonomic classification, etc. Any downstream analyses (comparative, etc) are only meaningful if the outcome of these initial data processing methods makes sense. Despite the tremendous progress in the last years, none of these approaches can completely recover the complex information encoded in metagenomes. They all rely on simplifying assumptions that can lead to strong limitations.
When presenting novel or improved methods, the accuracy of computational methods in metagenomics is often evaluated. But usually, these evaluations are hardly comparable: no general standard for the assessment of computational methods in metagenomics. This may result in users not well informed and misinterpretations of computational predictions.
To tackle this problem. the initiative for the "Critical Assessment of Metagenome Interpretation" (CAMI) was founded in 2014. It evaluates methods in metagenomics independently, comprehensively and without bias. The initiative supplies users with exhaustive quantitative data about the performance of methods in all relevant scenarios. It therefore guides users in the selection and application of methods and in their proper interpretation. Furthermore it provides valuable information to developers, allowing them to identify promising directions for their future work.
Project context
The 2nd CAMI offers several challenges: an assembly, a genome binning, a taxonomic binning and a taxonomic profiling challenge, on several multi-sample data sets from different environments, including long and short read data. Participants registered for download of the challenge datasets. They ran different tools, with different parameters on the different datasets. For reproducibility, participants could submit either a Docker container containing the complete workflow, a bioconda script or a software repository with detailed installation instructions, specifying all parameter settings and reference databases used. Altogether 5,002 submissions of 76 programs were received for the four challenge datasets, from 30 external teams and CAMI developers. The CAMI developers evaluated then the results using standardized metrics and then make sense from the different results Meyer et al, 2020
In this project, we would to show that Galaxy could be used as a platform to support the next CAMI challenges:
For team participating in the challenge
Galaxy could be connected to the benchmarking datasets
Galaxy could provide the computational resources to teams that may not have access to some
Tools and databaes could be available in Galaxy
Galaxy workflows could be shared via IWC / Dockstore for the reproducibility
Data could be formatted and submitted directly CAMI benchmarking portal via Galaxy
For CAMI developers and tool developers
Workflows for result evaluation could be available in Galaxy
Objectives of the project
Run the different challenges with different tools in Galaxy (some tools may need to be added in Galaxy)
Make the input data and results available in Galaxy
Global context
Metagenomics analysis relies on sophisticated computational approaches: assembly, binning, taxonomic classification, etc. Any downstream analyses (comparative, etc) are only meaningful if the outcome of these initial data processing methods makes sense. Despite the tremendous progress in the last years, none of these approaches can completely recover the complex information encoded in metagenomes. They all rely on simplifying assumptions that can lead to strong limitations.
When presenting novel or improved methods, the accuracy of computational methods in metagenomics is often evaluated. But usually, these evaluations are hardly comparable: no general standard for the assessment of computational methods in metagenomics. This may result in users not well informed and misinterpretations of computational predictions.
To tackle this problem. the initiative for the "Critical Assessment of Metagenome Interpretation" (CAMI) was founded in 2014. It evaluates methods in metagenomics independently, comprehensively and without bias. The initiative supplies users with exhaustive quantitative data about the performance of methods in all relevant scenarios. It therefore guides users in the selection and application of methods and in their proper interpretation. Furthermore it provides valuable information to developers, allowing them to identify promising directions for their future work.
Project context
The 2nd CAMI offers several challenges: an assembly, a genome binning, a taxonomic binning and a taxonomic profiling challenge, on several multi-sample data sets from different environments, including long and short read data. Participants registered for download of the challenge datasets. They ran different tools, with different parameters on the different datasets. For reproducibility, participants could submit either a Docker container containing the complete workflow, a bioconda script or a software repository with detailed installation instructions, specifying all parameter settings and reference databases used. Altogether 5,002 submissions of 76 programs were received for the four challenge datasets, from 30 external teams and CAMI developers. The CAMI developers evaluated then the results using standardized metrics and then make sense from the different results Meyer et al, 2020
In this project, we would to show that Galaxy could be used as a platform to support the next CAMI challenges:
Objectives of the project
Proposed agenda for the project
Prerequisites
Further reading and useful links