sfu-compbio / sinvict

SiNVICT: Ultra-Sensitive Detection of Single Nucleotide Variants and Indels in Circulating Tumour DNA
http://sfu-compbio.github.io/sinvict
27 stars 8 forks source link

SiNVICT

SiNVICT is a tool for the detection of SNVs and indels from cfDNA/ctDNA samples obtained by ultra-deep sequencing.

Prerequisites


Getting SiNVICT

To install SiNVICT, first you should fetch it from our git repository, or download one of the corresponding compressed zip/tar.gz packages. After downloading, change the current directory to the source directory sinvict, and run make in the terminal. The sinvict binary will be created, which is ready to use.

git clone https://github.com/sfu-compbio/sinvict.git

To get all submodules included with SiNVICT as well, the following command can be used:

git clone --recursive https://github.com/sfu-compbio/sinvict.git

Preprocessing Data for SiNVICT

SiNVICT requires the readcount file (See here for a detailed description of the format) to detect the SNVs/indels. However, based on the input file you have, you may follow the steps as described below to obtain the readcount file. SiNVICT is pre-packaged with the tools we have tested for different steps of obtaining the readcount file. You may opt to use any other software as you see fit.

Obtaining Readcount Files from different input files


Running SiNVICT

The simplest way to run SiNVICT is to use the following command:

./sinvict -t input-readcount-dir -o .

You may run SiNVICT with modified parameters. For example, to require the calls to have a minimum read depth of 70, you may use the following command.

./sinvict --minDepth 70 -t input-readcount-dir -o .

There are many options that can be added to the SiNVICT command line. Here are the explanation of these parameteres.


Output

SiNVICT generates a number of output files that are ordered according to their level of confidence/filtering as named below (each filter eliminates some calls from the previous layer):

  1. Poisson model: calls_level1.sinvict
  2. Minimum Read Depth filter: calls_level2.sinvict
  3. Strand-bias filter: calls_level3.sinvict
  4. Average position of called location among all reads supporting the call: calls_level4.sinvict
  5. Signal-to-Noise ratio filter: calls_level5.sinvict
  6. Homopolymer Regions filter: calls_level6.sinvict

Each tab delimited line in an output file corresponds to a call made by SiNVICT with the following fields:

  1. Chromosome Name
  2. Position
  3. Sample Name (readcount file name)
  4. Reference Base
  5. Read Depth
  6. Mutated Base(s)
  7. Number of reads supporting the mutation
  8. Variant Allele Percentage
  9. Number of reads mapped to the + strand (in format "+:value")
  10. Number of reads mapped to the - strand (in format "-: value")
  11. Average position of the base on reads as a fraction (0.00 - 1.00)
  12. "Somatic" or "Germline", predicted based on variant allele frequency (do not take as the ground truth)

The following is a sample output from SiNVICT:

chrX    66943552        22RV1_49C_10_to_1_10ng  A       3709    G       602     16.2308 +:430   -:172   0.53    Somatic
chrX    66943552        22RV1_49C_10_to_1_1ng   A       1979    G       111     5.60889 +:75    -:36    0.54    Somatic
chrX    66943552        22RV1_49C_10_to_1_2.5ng A       6221    G       803     12.9079 +:567   -:236   0.53    Somatic
chrX    66943552        22RV1_49C_10_to_1_5ng   A       2508    G       265     10.5662 +:183   -:82    0.52    Somatic
chrX    66943552        22RV1_49C_20_to_1_10ng  A       5247    G       304     5.79379 +:167   -:137   0.53    Somatic

SiNVICT man page

To view the full list of SiNVICT options and their descriptions, please run the following:

./sinvict -h

Contact

For any additional questions/comments/suggestions, please send an email to the following address:

ckockan@sfu.ca