simoncchu / GAPPadder

GAPPadder is tool for closing gaps on draft genomes with short sequencing data
27 stars 7 forks source link

GAPPadder

@2017 by Chong Chu, Xin Li and Yufeng Wu. This software is provided ``as is” without warranty of any kind. In no event shall the author be held responsible for any damage resulting from the use of this software. The program package, including source codes, executables, and this documentation, is distributed free of charge. If you use this program in a publication, please cite the following reference:
Chong Chu, Xin Li, and Yufeng Wu. "GAPPadder: A Sensitive Approach for Closing Gaps on Draft Genomes with Short Sequence Reads." bioRxiv (2017): 125534.

Functionalities and Usage of GAPPadder

GAPPadder is designed for closing gaps on the draft genomes with paired-end reads or mate-paired reads. The main advantages of GAPPadder is that (Refer to the paper for more detailed information):

Dependencies

The current released version of GAPPadder runs on Linux OS. And GAPPadder needs the following tools to be installed in the machine you are working on.

Download and Install

First, download the whole folder from https://github.com/Reedwarbler/GAPPadder, including the subfolder TERefiner and ContigsMerger-v0.2.0.

By default, users can directly run the tool and there is no need to install if you have all the dependencies installed. Before run, need to run the following command:

$ chmod +x ./TERefiner_1  &&  chmod +x ./ContigsMerger 

However, on some machines users may fail to run the pre-compiled tools TERefiner_1 and ContigsMerger, then users need to compile by themselves (Note, TERefiner needs bamtools to compile, and users need to set the bamtools path in the makefile) and run the follow commands:

$ cd TERefiner  &&  make  &&  cd .. 
$ cd ./ContigsCompactor-v0.2.0/ContigsMerger/  &&  make  &&  cd .. 
$ cp ./TERefiner/TERefiner_1 ./  &&  cp ./ContigsCompactor-v0.2.0/ContigsMerger/ContigsMerger ./
$ chmod +x ./TERefiner_1  &&  chmod +x ./ContigsMerger 

Preparing inputs

GAPPadder needs a configuration file in JSON format. The configuration file tells GAPPadder the basic settings. Users can find one sample from the same folder in this github cite. Once finish the configuration file, users can use this website (http://jsonlint.com/) to check whether there are errors. Here, we give an explanation on the parameters.

draft_genome

The path of the draft genome

raw_reads

The groups of paired end reads, with each pair one group.

alignments

The path of the alignment files (must be sorted bam/cram files). Should keep the same number as the group of PE reads.

software_path

parameters

kmer_length

The kmer lengths for assembly. For each k, there are several sub-k, which are the length of kmers going to be used by velvet.

Basic usage

Preprocess the draft genome to get the gap positions and flank regions:

$ python ./main.py -c Preprocess -g configuration-file-name 

Collect reads for each gap:

$ python ./main.py -c Collect -g configuration-file-name 

Construct the gap sequence and pick the best one:

$ python ./main.py -c Assembly -g configuration-file-name

Clean the old data:

$ python ./main.py -c Clean -g configuration-file-name

Ouptput

picked_seqs.fa contains the selected fully closed and extended gap sequences.