jiantao / Tangram

Fast Structural Variation Detection Toolbox
MIT License
18 stars 6 forks source link

========================================================================= Tangram 0.3.1 Release Distribution Documentation 2014-02-09 Author: Jiantao Wu (jiantaowu.xining@gmail.com) Wan-Ping Lee (wanping.lee@bc.edu) Marth Lab [1], Boston College Biology Department

Introduction

Tangram is a C/C++ command line toolbox for structural variation(SV) detection. It takes advantage of both read-pair and split-read algorithms and is extremely fast and memory-efficient. Powered by the Bamtools API [3], Tangram can call SV events on multiple BAM files (a population) simutaneously to increase the sensitivity on low-coverage dataset. Currently it reports mobile element insertions (MEI). More other SV event types will be introduced soon. For SNP calling and short INDEL calling, please check an other toolbox from our lab: FreeBayes[4].

Obtaining and Compiling

git clone git://github.com/jiantao/Tangram.git cd src make

To successfully compile Tangram, it requires:

  1. g++ 4.2.0 and above
  2. zlib
  3. pthread lib

Detection pipeline

Currently, Tangram contains seven sub-programs:

  1. tangram_bam : If the input bam files are not generated by MOSAIK [2], tangram_bam will add ZA tags that are necessary for the following steps.

  2. tangram_scan : Scan through the bam file and calculate the fragment length distribution for each library in that bam file. It will output the fragment length distribution files for each input bam file.

  3. tangram_merge : If more than one bam files need to be scanned, this program will combine all the fragment length distribution files together. It will output the merged fragment length distribution file that enable the detection of multiple bam files simutaneously. This step is optional if only one bam file (pooled bam file) was used.

  4. tangram_index : Index the normal and special (MEI sequences) reference file. It will output the indexed refrence file. This step is required for split read algorithm.

  5. tangram_detect : Detect and genotype the SV events from the MOSAIK aligned BAM files. It will output the unfiltered VCF files.

  6. tangram_filter : Filter the raw VCF file generated by the detector. NOTE: this program requires the windowBed (from bedtools) [5], Unix sort and grep to be in the default path.

  7. tangram_view_scan_file : Provide functions to view or change the contents in the lib_table.dat and hist.dat files (in binary format) that are generated by tangram_scan. This script can be used for a sanity check of the input bam files, such as missing MEI reference names or abnormal read groups.

The overall detection pipeline for Tangram looks like the following

tangram_bam (BAM Input) \ \ tangram_scan \ (BAM Input) \ -----> tangram_detect --> tangram_filter --> VCF file(s) / (BAM input) tangram_index / (Ref Fasta)

For the detailed usage of each program, please run "$PROGRAM -help"

ZA Tag Information

ZA tag (an optional tag in Bam file) is required for MEI detection with Tangram. The basic structure of this tag looks like this:

<@/&:MQ1:MQ2:SP_REF:NUM_MAP:CIGAR:MD>

There are 7 fields in this tag:

  1. @ or &: @ means this is the information for the current read and & means this is the information for its mate (pair-end sequencing)

  2. MQ1: The best mapping quality

  3. MQ2: The second best mapping quality

  4. SP_REF: If the read can be aligned to a special reference provided by user, this field will record the first two characters of the special reference name in this field, e.g. AL(ALU). Otherwise, this field will be empty.

  5. NUM_MAP: Number of mapping places of this read that can be found in the genome.

  6. CIGAR: CIGAR string of this mapping

  7. MD: MD string of this mapping

Bug Report

Please report bugs using the built-in bug reporting feature in github or by sending the authors an email.

References

[1] http://bioinformatics.bc.edu/marthlab/Main_Page [2] https://github.com/wanpinglee/MOSAIK [3] https://github.com/pezmaster31/bamtools [4] https://github.com/ekg/freebayes [5] http://code.google.com/p/bedtools