Systematic tool testing / validation

Supervisor: Tunc Kayikcioglu For degree: Bachelor/Project/Master Status: Open Keywords: tools, testing, simulation, validation, QA, QC

Global Biological/Research context

Galaxy provides a broad audience with graphical access to tools that are otherwise command-line based only. For an external tool to be served by galaxy, we need an xml file ("wrapper") that describes which buttons there should be to click, which help text should be displayed and what command should be executed upon invocation of this tool.

In addition to the datasets provided by the user, some of the incorporated tools need access to some external data sources, such as a reference DB to lookup. Such reference data can be fetched during runtime, or we can explicitly decide to cache a local copy on the HPC, which is beneficial especially if they are big in size. The xml file should then contain instructions about how to locate these local data resources. For some of the tools there are different version of such reference DBs, it is not necessarily the case that all DB versions are compatible with all releases of the tools.

Objectives of the project

While we have already implemented the functions to execute some tools and also to manage their DBs, we suspect that we might not be fully aware of which tools can be used with which DBs. In the best case, the tool might generate a fatal error, but in the worst case it will exit successfully albeit introducing numerical errors. Your task will be to identify such failure cases.

Proposed agenda for the project

Learn how to execute the tool(s) of interest via Galaxy GUI. If interested in a more automated approach, also bioblend
Test all versions of the tool with all DBs to identify fatal failures.
Generate simulated input datasets with known ground truth
Analyse the test datasets with different tool & DB versions to quantify numeric accuracy.
Propose hard constraints to be implemented on Galaxy to disable or discourage usage of bad tool and DB combinations.
Check relevant WFs or galaxy tutorials to see if they still work after these changes.

Prerequisites

Literacy in >=1 coding language, ideally python
Ability to run tools in a Linux environment
Basic understanding of statistical significance
Genuine interest in testing, troubleshooting and resilience against failures

usegalaxy-eu / project-ideas

Systematic tool testing / validation #40

Systematic tool testing / validation

Global Biological/Research context

Objectives of the project

Proposed agenda for the project

Prerequisites