usegalaxy-eu / project-ideas

A collection of project ideas suitable for Master and Bachelor students
MIT License
9 stars 2 forks source link

Implement a Bayesian optimisation based hyperparameter search method to Galaxy machine learning tools suite #16

Closed anuprulez closed 10 months ago

anuprulez commented 5 years ago

Implement a bayesian optimisation based hyperparameter search method to Galaxy machine learning tools suite

Supervisor: Anup Kumar (@anuprulez) and Björn Grüning (@bgruening ) For degree: Bachelor/Project/Master Status: Open Keywords: Hyperparameter search, Bayesian optimisation, machine learning, Galaxy tools, Scikit-learn and Keras

Global Biological/Research context

There are multiple parameters associated with the machine and deep learning algorithms which need to be tuned given a dataset to achieve the best performance. Manual setting of these hyperparameters is a cumbersome task and may either lead to sub-optimal configuration or a huge amount of time. To avoid this, hyperparameter search techniques like grid or random search or Bayesian optimisation can be applied to find the best combination of values of numerous hyperparameters of the machine and deep learning algorithms for a dataset in a less amount of time, with more accuracy and less effort.

Project context

Galaxy is a biological data-processing online platform. Recently, a tool suites of the machine and deep learning have been added to create predictive models from multiple datasets. The machine learning tools suite runs on the Scikit-learn background and the deep learning tools suite runs on the Keras and TensorFlow background. Two hyperparameter search techniques (grid and random search) are already implemented for Scikit-learn tools. The Keras tools have none so far.

Objectives of the project

The objective of the project is to implement a hyperparameter search technique based on Bayesian optimisation for these machine and deep learning tools suite.

Proposed agenda for the project

  1. Learn the basics of Galaxy and its tools, history, workflows and so on.
  2. Learn about the Scikit-learn and Keras tools in Galaxy
  3. Find a package for hyperparameter search technique based on Bayesian optimisation and add it as a hyperparameter search feature for Scikit-learn and Keras tools in Galaxy.
  4. Do a performance comparison of different search techniques (grid, random and bayesian optimisation) using a suite of the standard public datasets.
  5. Write a report.

Prerequisites

Further reading and useful links

anuprulez commented 10 months ago

Closing this issue as it has been completed by a master student for their master project