4/6/2017 Nathaniel Herrmann naherrmann@gmail.com https://github.com/njherrmann/pair-guides
The pair-guides script finds viable gRNA pairs for dual-guide gene blocks and saves these candidates to a simple CSV table.
The pair-guides script runs with Python 2.7 or greater. At present, the setup script runs on Mac and Linux only. It utilizes the following potentially nonstandard python modules: requests, bs4, lxml.
There is no app version of the script yet. Find instructions below to run the script from the command line.
The project directory contains a script named setup.sh. Using the Terminal, navigate to the project directory and run the setup script by executing the following shell command (do not omit the period before the script invocation):
. setup.sh
The pair-guides script requires two files to run: a settings file and a ChopChop results file. The settings file is labeled gene_block_settings.inp. The settings file must contain the CCDS ID number of the gene and the path to the ChopChop results textfile. The script includes a handful of optional settings that can also be specified. Find a sample settings file in the project directory that contains notes on these optional settings and their default values.
The ChopChop results file can be obtained by navigating to the ChopChop page for the desired gene and selecting "Results table" from the drop-down menu labeled "Download results." Save the text file on the new results page.
The easiest way to run the script is to save the ChopChop results file and the settings file to the same directory. From this directory, simply run:
pair-guides
This should create one output file. By default the output file will end with "_pairs.csv" after the basename of the results file. When running the script this way, the settings file must be named either "gene_block_settings.inp" or simply "settings.inp".
Alternately, the script can accept the settings file as a command-line input. This can be used to access a settings file from a distinct directory. This command looks like this:
pair-guides path/to/settings/file
This method has the advantage of allowing a settings file with any name.
The constant elements of the gene block are stored as variables in the gene_block_constants.const file of the project directory. These sequences are imported to build out the full gene blocks in the output.
The pair-guides script produces one output: a csv file containing all eligible gRNA pairs sorted in order of descending exon base pair deletion count. By default, the output file name and path are generated by replacing the extension of the input file path with "_pairs.csv". This defaults to placing the output file in the same directory as the input ChopChop results file. The settings input includes an optional output file path specification if the user needs to assign a different name or location to the output csv file.
NB: The script does not warn before writing to an output file. If there is already an output file with the same name and location as the new output, the old file will be overwritten! It is strongly recommended that the results files and corresponding settings files are kept in separate directories. Stay tuned for an update soon that will address this issue.
In order to easily pull updates to the script as they are published, it is not advisable to modify any of the files in the project directory. The setup script modifies the users PATH variables to allow the user to execute pair-guides from any directory. The git repository ignores word files with extensions .doc or .docx, so these are allowed in the project directory.
The project directory contains the following files:
src/ - source directory which contains python files
demo/ - a directory containing sample inputs for the script
gene_block_constants.const - defines constant sequences of gene block
setup.sh - install script that configures bindings and path variables
update.sh - script that pulls updates from github
log/ - repo containing a logging utility (built by setup.sh)
pair-guides - bash script that runs pair-guides.py (built by setup.sh)
Update the project directory to the latest version by pulling from the GitHub repository. This update command is enclosed in the script file update.sh. Run this update command by executing the following line from the project directory (do not omit the period before the script invocation):
. update.sh
NB: The update will not be successful if the files in the base project directory have been modified (or the files in the demo and src directories). The one exception to this rule is word documents. Git will ignore word files (with the .doc or .docx extension).