usegalaxy-eu / project-ideas

A collection of project ideas suitable for Master and Bachelor students
MIT License
9 stars 2 forks source link

Development of a Python tool to compare public Galaxy histories and implementation of an automated scoring system for an online & interactive game #28

Open bebatut opened 2 years ago

bebatut commented 2 years ago

Supervisor: Bérénice Batut For degree: Master Status: In progress Keywords: Python

Global context

Over the last years public media is reporting more and more about scientific advancements. Especially in the course of the last year, words like sequencing, DNA, RNA, mutations, and variants were used a lot in public media, but often with poor explanations which leaves the audience in the dark and not more informed. More and more people are sensibilized and would like to learn and understand more about scientific results and technical advancements. As scientists it is our duty to report about our work and integrate society into the scientific process. This will help to excite citizens for science and overcome fears and scepticism of some people against science.

On the other hand emerging and powerful technologies like DNA sequencing are getting cheaper and therefore more accessible for many applications, e.g. in personalized medicine. This produces more data to analyze by scientists. Platforms like Galaxy (Afgan et al, Nucl Acids Res , 2018) and the Galaxy training material (Batut et al, Cell syst, 2018) help scientists to analyze their own (complex) data in a user friendly way. However for each analysis there are several ways to perform it. Experience and knowledge helps to achieve good results, but sometimes one has to test several combinations of different algorithms and parameters. This can be exhausting and time consuming. Involving non scientists in this process would help scientists in exploring possibilities of their data analyses. As Galaxy can empower any researchers to analyze their own data, it can also be used to let citizens do their own data analysis and help researchers in their data analysis exploration.

The Street Science Community is a group of researchers from the University of Freiburg trying to bring DNA, sequencing, metagenomics and in general the scientific process closer to citizens. We successfully developed the BeerDEcoded project: materials and a series of hands-on workshops for pupils and citizens with the general aim of scientific outreach. During these workshops, we guide participants through the scientific project of the extraction and identification of different yeasts contained in a beer sample. The identification is performed by sequencing the extracted yeast DNA, using our self-developed protocols, and analyzing the sequenced DNA via an easy and straightforward user interface. Analyzing beer via this BeerDEcoded project is a great way for non specialists to visualize DNA and additionally learn about genomics and sequencing. This makes science tangible and accessible.

Project context

Because of the pandemic situation, face-to-face workshops are unfeasible. For a more scalable outreach to the public and the long term sustainability of the BeerDEcoded project, we aim to implement an encouraging and easy-to-understand online game (DNAnalyzer) where participants impersonate a scientist that helps an alien to learn and perform metagenomic data analyses.

The game will consist of several levels:

With this project, we would like to develop the prototype of the 2nd part of the game.

Objectives of the project

Proposed agenda for the project

  1. Get familiar with Galaxy and histories in Galaxy
  2. Write the tool to compare public Galaxy histories with Python and store the source code on GitHub
  3. Document the tool directly with the code
  4. Package the tool (pypi and conda)

Prerequisites

Further reading and useful links

hexylena commented 2 years ago

I've wanted the "compare history" for a while as well! We'd love to use it to automate tapas certificates.

I considered having a very "loose" checking, based on workflow tests, except checking all files in the history to see if they matched one of the test cases mentioned in the relevant workflow test. Then we can re-use the existing library of test cases (and use it further to automatically generate example histories.)