njherrmann / pair-guides

A tool to find valid dual-gRNA guide pairs for CRISPR/Cas9 gene blocks
0 stars 0 forks source link

4/6/2017 Nathaniel Herrmann naherrmann@gmail.com https://github.com/njherrmann/pair-guides

CRISPR/Cas9 dual-gRNA Guide Pairer

The pair-guides script finds viable gRNA pairs for dual-guide gene blocks and saves these candidates to a simple CSV table.

Requirements

The pair-guides script runs with Python 2.7 or greater. At present, the setup script runs on Mac and Linux only. It utilizes the following potentially nonstandard python modules: requests, bs4, lxml.

There is no app version of the script yet. Find instructions below to run the script from the command line.

Installation/Setup

The project directory contains a script named setup.sh. Using the Terminal, navigate to the project directory and run the setup script by executing the following shell command (do not omit the period before the script invocation):

. setup.sh

Usage

The pair-guides script requires two files to run: a settings file and a ChopChop results file. The settings file is labeled gene_block_settings.inp. The settings file must contain the CCDS ID number of the gene and the path to the ChopChop results textfile. The script includes a handful of optional settings that can also be specified. Find a sample settings file in the project directory that contains notes on these optional settings and their default values.

The ChopChop results file can be obtained by navigating to the ChopChop page for the desired gene and selecting "Results table" from the drop-down menu labeled "Download results." Save the text file on the new results page.

The easiest way to run the script is to save the ChopChop results file and the settings file to the same directory. From this directory, simply run:

pair-guides

This should create one output file. By default the output file will end with "_pairs.csv" after the basename of the results file. When running the script this way, the settings file must be named either "gene_block_settings.inp" or simply "settings.inp".

Alternately, the script can accept the settings file as a command-line input. This can be used to access a settings file from a distinct directory. This command looks like this:

pair-guides path/to/settings/file

This method has the advantage of allowing a settings file with any name.

The constant elements of the gene block are stored as variables in the gene_block_constants.const file of the project directory. These sequences are imported to build out the full gene blocks in the output.

Output

The pair-guides script produces one output: a csv file containing all eligible gRNA pairs sorted in order of descending exon base pair deletion count. By default, the output file name and path are generated by replacing the extension of the input file path with "_pairs.csv". This defaults to placing the output file in the same directory as the input ChopChop results file. The settings input includes an optional output file path specification if the user needs to assign a different name or location to the output csv file.

NB: The script does not warn before writing to an output file. If there is already an output file with the same name and location as the new output, the old file will be overwritten! It is strongly recommended that the results files and corresponding settings files are kept in separate directories. Stay tuned for an update soon that will address this issue.

Project Directory

In order to easily pull updates to the script as they are published, it is not advisable to modify any of the files in the project directory. The setup script modifies the users PATH variables to allow the user to execute pair-guides from any directory. The git repository ignores word files with extensions .doc or .docx, so these are allowed in the project directory.

The project directory contains the following files:

src/ - source directory which contains python files

demo/ - a directory containing sample inputs for the script

gene_block_constants.const - defines constant sequences of gene block

setup.sh - install script that configures bindings and path variables

update.sh - script that pulls updates from github

log/ - repo containing a logging utility (built by setup.sh)

pair-guides - bash script that runs pair-guides.py (built by setup.sh)

Updating from GitHub

Update the project directory to the latest version by pulling from the GitHub repository. This update command is enclosed in the script file update.sh. Run this update command by executing the following line from the project directory (do not omit the period before the script invocation):

. update.sh

NB: The update will not be successful if the files in the base project directory have been modified (or the files in the demo and src directories). The one exception to this rule is word documents. Git will ignore word files (with the .doc or .docx extension).