usegalaxy-eu / project-ideas

A collection of project ideas suitable for Master and Bachelor students
MIT License
9 stars 2 forks source link

Intelligent search for Galaxy using machine learning and search-engines #11

Closed anuprulez closed 8 months ago

anuprulez commented 5 years ago

Intelligent text-based search for Galaxy using machine learning

Supervisor: Björn Grüning (@bgruening) / Anup Kumar (@anuprulez) For degree: Master (Project) Status: Open Keywords: Galaxy, Text-based search, Natural language processing, Machine learning, Search-engines

Global Biological/Research context

Galaxy is an open-source, web-based biological data processing platform. To process data, it offers thousands of tools and using these tools, numerous data-processing pipelines (workflows) can be created. Galaxy stores a huge collection of tools, workflows and (processed and raw) datasets. To find relevant items in a short time from a huge collection of data, an efficient/intelligent search is needed. To build such a search feature, few ideas from natural language processing and machine learning can be explored, implemented and compared.

Objectives of the project

  1. Understand how Galaxy works - what are tools, workflows, datasets.
  2. Explore relevant literature from search frameworks, natural language processing and machine learning. 2.1. Apache Solr 2.2. Elasticsearch 2.3. ....
  3. Build a program which shows relevant search results (tools/workflows/datasets) based on a user query.
  4. Analyze the results.
  5. Write a report.
  6. Integrate the program into Galaxy (depending on the time taken to finish).

Prerequisites

Basic knowledge of:

Further reading and useful links