jim-bo / silp2

Scaffolding using Integer Linear Programming
MIT License
2 stars 1 forks source link

silp2

Scaffolding using Integer Linear Programming

Overview

SILP2 is the second incarnation of the scaffolding tool developed at the University of Connecticut by James Lindsay and Dr. Ion Mandoiu, and at the Georgia State by Hamed Salooti and Dr. Alex Zelicovsky. It is built to be quick, and very flexible. The quickness comes from the usage of Nonserial Dynamic Programming which decomposes the scaffolding problem into many smaller sub-problems. The flexibility comes from the fact that it uses a generic ILP solver (CPLEX) where the constraints, objective and weights are easily tweaked.

Usage

The tool is divided up into several sub-programs which need to be run in-order. There is a convience function to run them all. Use the "-h" argument after each of the following sub-commands for a description of their usage.

python silp.py [sub-command] -h
  1. prep: prepares the scaffolding input files
  2. align: aligns the paired reads (required bowtie2) installed
  3. nodes: creates the nodes of the scaffolding graph
  4. edges: adds edges to the scaffolding graph
  5. bundles: compacts edges into bundles and computes weights
  6. decompose: run decomposition procedure
  7. orient: orients the contigs
  8. order: orders the contigs
  9. gap: computes gap sizes
  10. write_agp: outputs the results in a common format
  11. write: writes verbose results [debug]
  12. fasta: writes the scaffold in fasta format N's in gaps
  13. all: runs all the above

Installation

This is primarily a python program, it relies on several python packages:

The decomposition is written in c/c++ and relies on the OGDF library. It must be installed and compiled. We have included the source code from this library in the package and added a compilation script to first build OGDF then our decomposition program. Compilation is done by calling

./build.sh

in the root folder.

Bowtie2 is already required for proper alignment of reads to contigs. It must be installed and available in the $PATH variable.

Example script.

A demonstration script file called run.sh is provided in the root directory to serve as an example on how to run the tool. A small testcase is available to test the tool.

#!/bin/bash
# set pointer to program.
program="silp2/silp.py"
work_dir="./"
ref_dir="${work_dir}/ref"
asm_dir="${work_dir}/asm"
aln_dir="${work_dir}/aln"
scf_dir="${work_dir}/scf"

# align
python $program align \
        -w $scf_dir \
        -a $aln_dir \
        -p 5 \
        -c ${asm_dir}/asm.fasta \
        -q1 ${ref_dir}/read1.fastq \
        -q2 ${ref_dir}/read2.fastq \

python $program pair \
        -w $scf_dir \
        -a $aln_dir \
        -c ${asm_dir}/asm.fasta \
        -l ${asm_dir}/asm.length \
        -s1 ${aln_dir}/tmp1.sam \
        -s2 ${aln_dir}/tmp2.sam \
        -k 2 \

# preprocess
python $program nodes -w ${scf_dir} -c ${asm_dir}/asm.fasta
python $program edges -w ${scf_dir} -i 3500 -s 350 -rf -s1 ${aln_dir}/read1.sam -s2 ${aln_dir}/read2.sam
python $program bundles -w ${scf_dir} -b 1 -p 90 -bup 1 -r ${aln_dir}/ant -i 3500 -s 350
python $program decompose -w ${scf_dir} -m 2500

# start time
start=$(date +%s)

# scaffold
python $program orient -w $scf_dir -z 0
python $program order -w $scf_dir
python $program gap -w $scf_dir
python $program write -w $scf_dir -a ${scf_dir}/scf.agp
python $program fasta -w $scf_dir -a ${scf_dir}/scf.agp -c ${asm_dir}/asm.fasta -f ${scf_dir}/scf.fasta

# stop time
stop=$(date +%s)
echo RUNTIME: $(expr $stop - $start) >> ${scf_dir}/scf.agp

Disclaimer

This is a research tool written in a research enviroment. No support is offered and bugs may be present. Only one library size is support at this time. CPLEX is required. a free license is available to qualified academic institutions.