@2017 by Chong Chu, Xin Li and Yufeng Wu. This software is provided ``as is” without warranty of any
kind. In no event shall the author be held responsible for any damage resulting from the
use of this software. The program package, including source codes, executables, and this
documentation, is distributed free of charge.
If you use this program in a publication, please cite the following reference:
Chong Chu, Xin Li, and Yufeng Wu. "GAPPadder: A Sensitive Approach for Closing Gaps on Draft Genomes with Short Sequence Reads." bioRxiv (2017): 125534.
GAPPadder is designed for closing gaps on the draft genomes with paired-end reads or mate-paired reads. The main advantages of GAPPadder is that (Refer to the paper for more detailed information):
The current released version of GAPPadder runs on Linux OS. And GAPPadder needs the following tools to be installed in the machine you are working on.
$ make clean
$ make 'MAXKMERLENGTH=60'
First, download the whole folder from https://github.com/Reedwarbler/GAPPadder, including the subfolder TERefiner and ContigsMerger-v0.2.0.
By default, users can directly run the tool and there is no need to install if you have all the dependencies installed. Before run, need to run the following command:
$ chmod +x ./TERefiner_1 && chmod +x ./ContigsMerger
However, on some machines users may fail to run the pre-compiled tools TERefiner_1 and ContigsMerger, then users need to compile by themselves (Note, TERefiner needs bamtools to compile, and users need to set the bamtools path in the makefile) and run the follow commands:
$ cd TERefiner && make && cd ..
$ cd ./ContigsCompactor-v0.2.0/ContigsMerger/ && make && cd ..
$ cp ./TERefiner/TERefiner_1 ./ && cp ./ContigsCompactor-v0.2.0/ContigsMerger/ContigsMerger ./
$ chmod +x ./TERefiner_1 && chmod +x ./ContigsMerger
GAPPadder needs a configuration file in JSON format. The configuration file tells GAPPadder the basic settings. Users can find one sample from the same folder in this github cite. Once finish the configuration file, users can use this website (http://jsonlint.com/) to check whether there are errors. Here, we give an explanation on the parameters.
draft_genome
The path of the draft genome
raw_reads
The groups of paired end reads, with each pair one group.
alignments
The path of the alignment files (must be sorted bam/cram files). Should keep the same number as the group of PE reads.
software_path
parameters
kmer_length
The kmer lengths for assembly. For each k, there are several sub-k, which are the length of kmers going to be used by velvet.
Preprocess the draft genome to get the gap positions and flank regions:
$ python ./main.py -c Preprocess -g configuration-file-name
Collect reads for each gap:
$ python ./main.py -c Collect -g configuration-file-name
Construct the gap sequence and pick the best one:
$ python ./main.py -c Assembly -g configuration-file-name
Clean the old data:
$ python ./main.py -c Clean -g configuration-file-name
picked_seqs.fa contains the selected fully closed and extended gap sequences.