THIS VERSION IS DEPRECATED GO TO https://github.com/SchwartzLabURI/SISRS/
SISRS: Site Identification from Short Read Sequences
Version 1.6.2
Copyright (c) 2013-2016 Rachel Schwartz Rachel.Schwartz@asu.edu
https://github.com/rachelss/SISRS
More information: Schwartz, R.S., K.M Harkins, A.C. Stone, and R.A. Cartwright. 2015. A composite genome approach to identify phylogenetically informative data from next-generation sequencing. BMC Bioinformatics. 16:193.
(http://www.biomedcentral.com/1471-2105/16/193/)
Talk from Evolution 2014 describing SISRS and its application:
https://www.youtube.com/watch?v=0OMPuWc-J2E&list=UUq2cZF2DnfvIUVg4tyRH5Ng
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See the GNU General Public License for more details.
Next-gen sequence data such as Illumina HiSeq reads. Data must be sorted into folders by taxon (e.g. species or genus). Paired reads in fastq format must be specified by _R1 and _R2 in the (otherwise identical) filenames. Paired and unpaired reads must have a fastq file extension.
sisrs command options
sites: produce an alignment of sites from raw reads
loci: produce a set of aligned loci based on the most variable regions of the composite genome
subSample: run sisrs subsampling scheme, subsampling reads from all taxa to ~10X coverage across species, relative to user-specified genome size
buildContigs: given subsampled reads, run sisrs composite genome assembly with user-specified assembler
alignContigs: align reads to composite genome as single-ended, uniquely mapped
mapContigs: align composite genome reads to a reference genome (optional)
identifyFixedSites: find sites with no within-taxa variation
outputAlignment: output alignment file of sisrs sites
changeMissing: given alignment of sites (alignment.nex), output a file with only sites missing fewer than a specified number of samples per site
Nexus file with variable sites in a single alignment. Usable in most major phylogenetics software as a concatenated alignment with a setting for variable-sites-only.
The folder test_data (https://github.com/rachelss/SISRS_test_data) contains simulated data for 10 species on the tree found in simtree.tre . Using 40 processors this run took 9 minutes. Analysis of the alignment output by sisrs using raxml produced the correct tree.
sisrs sites -g 1745690
sisrs sites -g 1745690 -p 40 -m 4 -f /usr/test_data -z /usr/output_data -t .99 -a minia
sisrs subSample -g 1745690 -f /usr/test_data -c 0
sisrs loci -g 1745690 -p 40 -l 2 -f /usr/test_data # Will run sites first, then loci
sisrs loci -g 1745690 -p 40 -l 2 -f /usr/SISRS_sites_ouput # Will run loci from previous sites data
Get loci from your fastq files given known loci.
first name your reference loci ref_genes.fa and put in your main folder
sisrs loci -p 40 -f /usr/test_data