Implement a bayesian optimisation based hyperparameter search method to Galaxy machine learning tools suite
Supervisor: Anup Kumar (@anuprulez) and Björn Grüning (@bgruening )
For degree: Bachelor/Project/Master
Status: Open
Keywords: Hyperparameter search, Bayesian optimisation, machine learning, Galaxy tools, Scikit-learn and Keras
Global Biological/Research context
There are multiple parameters associated with the machine and deep learning algorithms which need to be tuned given a dataset to achieve the best performance. Manual setting of these hyperparameters is a cumbersome task and may either lead to sub-optimal configuration or a huge amount of time. To avoid this, hyperparameter search techniques like grid or random search or Bayesian optimisation can be applied to find the best combination of values of numerous hyperparameters of the machine and deep learning algorithms for a dataset in a less amount of time, with more accuracy and less effort.
Project context
Galaxy is a biological data-processing online platform. Recently, a tool suites of the machine and deep learning have been added to create predictive models from multiple datasets. The machine learning tools suite runs on the Scikit-learn background and the deep learning tools suite runs on the Keras and TensorFlow background. Two hyperparameter search techniques (grid and random search) are already implemented for Scikit-learn tools. The Keras tools have none so far.
Objectives of the project
The objective of the project is to implement a hyperparameter search technique based on Bayesian optimisation for these machine and deep learning tools suite.
Proposed agenda for the project
Learn the basics of Galaxy and its tools, history, workflows and so on.
Learn about the Scikit-learn and Keras tools in Galaxy
Find a package for hyperparameter search technique based on Bayesian optimisation and add it as a hyperparameter search feature for Scikit-learn and Keras tools in Galaxy.
Do a performance comparison of different search techniques (grid, random and bayesian optimisation) using a suite of the standard public datasets.
Implement a bayesian optimisation based hyperparameter search method to Galaxy machine learning tools suite
Supervisor: Anup Kumar (@anuprulez) and Björn Grüning (@bgruening ) For degree: Bachelor/Project/Master Status: Open Keywords: Hyperparameter search, Bayesian optimisation, machine learning, Galaxy tools, Scikit-learn and Keras
Global Biological/Research context
There are multiple parameters associated with the machine and deep learning algorithms which need to be tuned given a dataset to achieve the best performance. Manual setting of these hyperparameters is a cumbersome task and may either lead to sub-optimal configuration or a huge amount of time. To avoid this, hyperparameter search techniques like grid or random search or Bayesian optimisation can be applied to find the best combination of values of numerous hyperparameters of the machine and deep learning algorithms for a dataset in a less amount of time, with more accuracy and less effort.
Project context
Galaxy is a biological data-processing online platform. Recently, a tool suites of the machine and deep learning have been added to create predictive models from multiple datasets. The machine learning tools suite runs on the Scikit-learn background and the deep learning tools suite runs on the Keras and TensorFlow background. Two hyperparameter search techniques (grid and random search) are already implemented for Scikit-learn tools. The Keras tools have none so far.
Objectives of the project
The objective of the project is to implement a hyperparameter search technique based on Bayesian optimisation for these machine and deep learning tools suite.
Proposed agenda for the project
Prerequisites
Further reading and useful links