usegalaxy-eu / project-ideas

A collection of project ideas suitable for Master and Bachelor students
MIT License
9 stars 2 forks source link

Integrating MAPIE to have prediction intervals of classifiers and regressors to Galaxy ML tools #43

Open anuprulez opened 5 months ago

anuprulez commented 5 months ago

Integrating MAPIE to have prediction intervals of classifiers and regressors to Galaxy-ML tools

Supervisor: @anuprulez @bgruening For degree: Bachelor/Project Status: Open (assigned to ...) Keywords: MAPIE - Scikit-learn - Classifier - Regressor - Prediction intervals

Global Biological/Research context

Integrating MAPIE (Model Agnostic Prediction Interval Estimation) into Galaxy-ML tools represents a significant advancement in predictive modeling capabilities. By incorporating prediction intervals for classifiers and regressors, Galaxy-ML enhances its predictive accuracy and provides users with a more comprehensive understanding of model uncertainty. MAPIE's model-agnostic approach ensures flexibility across various machine learning algorithms, allowing for robust interval estimation regardless of the specific model employed. This integration empowers users to make more informed decisions by quantifying the range of possible outcomes, thereby improving the reliability and interpretability of predictions generated by Galaxy-ML tools. Ultimately, the inclusion of prediction intervals enhances the utility of Galaxy-ML for several predictive tasks/analyses in Bioinformatics, by offering a better understanding of predictive uncertainty.

Project context

The integration of project is a proof-of-concept to know how MAPIE can be integrated, first to a few ML tools in Galaxy

Objectives of the project

General objectives of the project

Proposed agenda for the project

  1. Understand MAPIE (https://github.com/scikit-learn-contrib/MAPIE) by going through its documentation.
  2. Apply MAPIE on a few classification and regression datasets to do hands-on using Penn Machine learning Benchmark datasets (https://github.com/EpistasisLab/pmlb).
  3. Integrate MAPIE to a few Galaxy ML tools (classifiers and regressors) as a feature.
  4. Update GTN tutorials (https://training.galaxyproject.org/training-material/topics/statistics/) adding results of this newly added feature.

Prerequisites

Further reading and useful links