jfizzy / ResistanceDB

ResistDB Project
MIT License
0 stars 0 forks source link

ResistanceDB

ResistanceDB Project

Usage Instructions

There are 3 tools in this repository, also included are a list of sample files. The tools are called Mia, Leo, and Pablo. Please note that you will not be able to use Mia to perform actual raw conversion without the Thermo Scientific Xcalibur software installed, however you can start her.

Mia

Please note that Mia was developed for a Windows environment. This setup guide assumes a windows environment and Python3.6. Although the GUI will work on Linux based systems, it is not supported.

To run Mia, enter the Mia directory and create a virtual environment:

virvualenv .miavenv

Activate the virtual environment

.\.miaenv\Sources\activate

Install required packages

pip3 install -r requirements.txt

Run mia

python3.6 mia.pyw

The database is included with Mia and is called files.db. You can edit it with sqlite3. To see the files table schema, use sqlite3 files.db then run .schema files

IF THE FILE DATABASE DOES NOT EXIST:

It can be created using sqlite3. Ensure sqlite3 has been downloaded and added to the path environment variable. The command:

sqlite3 name_of_db.db

will create the database and open the sqlite3 console to the database. After this, use the command:

CREATE TABLE files(filename varchar(1028), date_created date, date_moved date, new_location varchar(1028));

To create the table that will maintain the file information. As mia is hardcoded to use a table called "files", please ensure you do not change the name of the table. The database can be named whatever you would like. After you have done this, ensure that mia is pointing to this database (whatever you have named it) in the database field of the GUI.

Leo

While Leo was also developed on a Windows environment, you should still be able to run him cross platform. Some functionality may not work.

To run Leo, enter the leo directory and create a virtual environment:

virtualenv .leovenv

Activate the virtual environment

.\.leovenv\Sources\activate

Install required packaed

pip3 install -r requirements.txt

Run leo

python3.6 leo.pyw

A sample peaks file that leo can be used to filter is under ResistanceDB\files\peaks\peaks.csv

Pablo

Currently, Pablo is run through the Leo interface. Simply parse the sampe peaks.csv and then click "Visualize Last Parse". This will open Pablo's visualizations in the browser of your choosing. No more requirements need to be installed for Pablo as they are included in the Leo requirements.txt (installed above).

If you wish to use pablo alone, change directory to pablo/plotly optionally create venv, install required dependencies:

pip install numpy

pip install plotly

Then run python3.6 layout.py [source to filtered peaks csv] [optional: source to condensed peaks csv]

Pablo expects very specific datasets, so please use files generated by leo.

:two_men_holding_hands: Authors

Tyrone Lagore and James MacIsaac

:bookmark: Context

The Lewis Research Group is investigating the connection between metabolic adaptation and virulence of human pathogens. Using Mass Spectrometry, they aim to help reduce the time it takes to identify high risk patients by interpreting the results of tests on infection strains. The process developed has proven to be more efficient than currently implemented methods, and speeding up the analysis of data will allow for even faster results. Currently, extensive analysis is performed by lab technicians on data produced by MAVEN (Metabolomic Analysis and Visualization Engine). Our research project aims to automate the repetitive work done in the process of data analysis performed by lab technicians.

:soccer: Goals

The initial objective is to build a tool that takes as input a configuration file of known bio-markers as well as an input XML file of ‘good’ peaks to be compared against these markers. The output will be a resulting file containing information on how each peak compares to its most similar benchmark in the configuration. Our belief is that this will be the optimal approach to this problem, and will lead to a solution that is reliable and ready to be scaled up.

The extension to this first objective will be to adapt the tool to use a database management system in order to keep a running history of both known marker files, as well as the results produced by these markers against any given data file and its configuration parameters. This will be a valuable asset for the future of the greater project.