=========================================================================
This is a tool for detecting structural variations using soft-clipping information From SAM files.
ClipCrop uses SHRiMP2 or bwa internally.
First you have to install SHRiMP2 or bwa (>=v0.5) and add the binary to PATH env. If you use SHRiMP2, "export SHRIMP_FOLDER=/path/to/SHRiMP2" to your env.
ClipCrop is implemented in Node.js.
For users who are used to Node.js, just
$ npm install clipcrop
Of course, in the field of bioinformatics, Node.js is still not a major scripting language,
You should install Node.js by its version manager called nvm.
Do not install Node.js from apt-get or other OS package managers!
$ git clone git://github.com/creationix/nvm.git ~/.nvm
$ source ~/.nvm/nvm.sh
$ nvm install v0.6.1
$ nvm use v0.6.1
$ npm install clipcrop
The installation of Node.js may take a long time, but be patient.
For later use, it is better to write the following lines to your .bashrc (or alternatives).
source ~/.nvm/nvm.sh
nvm use v0.6.1
$ clipcrop <sam file> <reference fasta file> [<fasta information json file>]
sam file | SAM file with soft-clipping information. The recommended mapping tool is [bwa](http://bio-bwa.sourceforge.net/). |
---|---|
reference fasta file | reference genome used for mapping |
fasta information json file (optional) | JSON file for [FASTAReader](https://github.com/shinout/FASTAReader). This file optional, and is used for faster reading of reference genomes. See [README of FASTAReader](https://github.com/shinout/FASTAReader/blob/master/README.md) for more detail. |
dir | directory to put result files. default = basename(path) |
---|---|
bp_filter_parallel | the number of processes to use to filter breakpoints. default: 8 |
max_diff | max difference within breakpoint cluster values. default: 2 |
min_cluster_size | minimum cluster size to be a valid breakpoint. default: 10 |
min_quality | minimum base quality score to allow, default: 5 |
bases_around_break | number of extended bases around breakpoint to be mapped by clipped sequences. default: 1000 |
sv_max_diff | max difference within breakpoint cluster values. default: 10 |
sv_min_cluster_size | minimum cluster size to be a valid SV. default: 10 |
bwa_threads | the number of threads bwa uses. default: 8 |
results are formatted as BED format.
#rname start end type subtype len score rname2 start2 caller other
chr1 224199455 224199456 INS * * 38 = * clipcrop num:158 LR:49/109
rname | the name of the chromosome |
---|---|
start | start position of the SV events |
end | end position of the SV events |
type | SV types (one of DEL, INS, INV, CTX, DUP) CTX : translocation |
subtype | subtypes of each SV types. |
len | length of the event |
score | reliability score of the event. If 0, it cannot be reliable. |
rname2 | (for translocation) the chromosome of the second breakpoint. |
start2 | (for translocation) the start position of the second breakpoint. |
caller | always "clipcrop" |
other.num | the number of supported sequences of the breakpoint |
other.LR | the number of L/R clips |