usegalaxy-eu / project-ideas

A collection of project ideas suitable for Master and Bachelor students
MIT License
9 stars 2 forks source link

Chemical property prediction with machine learning on the Galaxy platform #18

Closed simonbray closed 2 years ago

simonbray commented 4 years ago

Prediction of chemical properties using machine learning on the Galaxy platform

Supervisor: Simon Bray (@simonbray), Alireza Khanteymoori (@khanteymoori) For degree: Bachelor/Project/Master Status: Open Keywords: cheminformatics, machine learning, Galaxy tools

Global research context

Cheminformatics is the use of computational methods for analysis and solution of chemical problems. A large number of open-source tools exist for the representation and manipulation of chemical structures. In recent years, the use of machine learning has increased in popularity in cheminformatics, thanks to the availability of large datasets and access to high-performance and cloud compute resources. Many different potential applications exist, ranging from selection of drug candidates to predicting the outcome of chemical reactions.

Project context

A number of tools for cheminformatics analysis have already been integrated into the Galaxy platform, an open-source web-based environment for analysis of scientific data. The Galaxy platform also contains numerous tools for constructing machine learning models. However, tools are missing for calculating the properties of chemical structures.

Objectives of the project

This project focuses on developing a tool which can be used to predict chemical properties from the structure of an input molecule. The prediction model will be built using machine learning techniques, including deep learning. Chemical properties that can be predicted include solubility, toxicity, melting/boiling point, druglikeness and many others.

Proposed agenda for the project

  1. Collect and curate a dataset from publicly available databases, such as PubChem and ChEMBL.
  2. Create prediction systems for chemical properties using the cheminformatics and machine learning tools already available on Galaxy.
  3. Integrate new tools into Galaxy, as required for the project.
  4. Optional: integrate DeepChem package for deep learning cheminformatics analyses into the Galaxy platform.

Prerequisites

Further reading and useful links

simonbray commented 4 years ago

Assigned to @lorrainealisha75.

simonbray commented 2 years ago

Closing, we had two different students working on similar projects and I don't want to offer it to anyone else.