halelab / GBS-SNP-CROP

GBS SNP Calling Reference Optional Pipeline
GNU General Public License v2.0
31 stars 31 forks source link

GBS-SNP-CROP

Latest release v.4.1 (October 6, 2019)

Introduction

The GBS SNP Calling Reference Optional Pipeline (GBS-SNP-CROP) is executed via a sequence of seven Perl scripts that integrate custom parsing and filtering procedures with well-known, vetted bioinformatic tools, giving the user full access to all intermediate files. By employing a novel strategy of variant (SNPs and indels) calling based on the correspondence of within-individual to across-population patterns of polymorphism, the pipeline is able to identify and distinguish high-confidence variants from both sequencing and PCR errors, whether or not a reference genome is available. In the latter case, the pipeline adopts a clustering strategy to build a population-tailored "Mock Reference" using the same GBS data for downstream calling and genotyping. Designed for libraries of either paired-end (PE) or single-end (SE) reads of arbitrary lengths, GBS-SNP-CROP maximizes data usage by eliminating unnecessary data culling due to imposed length uniformity requirements. GBS-SNP-CROP is a complete bioinformatics pipeline developed primarily to support curation, research, and breeding programs wishing to utilize GBS for the cost-effective genome-wide characterization of plant genetic resources.

Pipeline workflow

Stage 1. Process the raw GBS data

Stage 2. Build the Mock Reference

Stage 3. Map the processed reads and generate standardized alignment files

Stage 4. Call Variants and Genotypes

PLEASE NOTE: GBS-SNP-CROP is an intentionally modular and flexible pipeline. If your data are already demultiplexed and filtered, simply skip Stage 1 and enter the pipeline at Stage 2. If you have a reference genome and no need for a Mock Reference, simply skip Stage 2 and go directly to Stage 3. Refer to the User Manual for input file naming conventions for each Step.

Below is a schematic of the workflow, with inputs and outputs (boxes) indicated for each Step (arrows).

Released versions

v.4.1: Released on 10/6/2019
v.4.0: Released on 10/22/2018
v.3.0: Released on 2/8/2018
v.2.0: Released on 2/22/2017
v.1.1: Released on 3/11/2016
v.1.0: Released on 1/12/2016

Getting Help

Begin by carefully going through the GBS-SNP-CROP User manual. Before posting a question or starting a discussion, please first refer to the FAQ page. Also, please check your barcode ID file for empty characters or blank spaces and verify that it was saved as a tab-delimited file. If you're still facing an issue or have suggestions for improving this tool, kindly submit your question or comment to our Google groups page.

Requirements

Citing GBS-SNP-CROP

Melo et al. GBS-SNP-CROP: A reference-optional pipeline for SNP discovery and plant germplasm characterization using genotyping-by-sequencing data. BMC Bioinformatics. 2016. 17:29. DOI 10.1186/s12859-016-0879-y.