karel-brinda / ococo

Ococo: the first online variant and consensus caller. Call genomic consensus directly from an unsorted SAM/BAM stream.
https://arxiv.org/abs/1712.01146
MIT License
47 stars 3 forks source link
bioinformatics consensus ngs online-algorithms variant-calling

OCOCO - the first online variant and consensus caller

Build Status Arxiv install with bioconda DOI

Abstract

Motivation: Identifying genomic variants is an essential step for connecting genotype and phenotype. The usual approach consists of statistical inference of variants from alignments of sequencing reads. State-of-the-art variant callers can resolve a wide range of different variant types with high accuracy. However, they require that all read alignments be available from the beginning of variant calling and be sorted by coordinates. Sorting is computationally expensive, both memory- and speed-wise, and the resulting pipelines suffer from storing and retrieving large alignments files from external memory. Therefore, there is interest in developing methods for resource-efficient variant calling.

Results: We present Ococo, the first program capable of inferring variants in a real-time, as read alignments are fed in. Ococo inputs unsorted alignments from a stream and infers single-nucleotide variants, together with a genomic consensus, using statistics stored in compact several-bit counters. Ococo provides a fast and memory-efficient alternative to the usual variant calling. It is particularly advantageous when reads are sequenced or mapped progressively, or when available computational resources are at a premium.

Several-bit Ococo counters

Citation

Brinda K, Boeva V, Kucherov G. Ococo: an online variant and consensus caller. arXiv:1712.01146 [q-bio.GN], 2018. https://arxiv.org/abs/1712.01146

Results from the paper were generated using code in the following repository: https://github.com/karel-brinda/ococo-paper-analysis. The entire computation is also available as a CodeOcean capsule.

Quick example

git clone --recursive https://github.com/karel-brinda/ococo
cd ococo && make -j
./ococo -i test.bam -f test.fa --vcf-cons -

Installation

From Bioconda

conda install -c bioconda ococo

Building from source

Prerequisities

Compilation: make

Installation: make install

How to use

SYNOPSIS
       ococo -i <SAM/BAM file> [other options]

DESCRIPTION
       Ococo  is  a  program to call variants and a genomic consensus directly
       from an unsorted SAM/BAM stream.

   Input options:
       -i, --input FILE
          Input SAM/BAM file (- for standard input).

       -f, --fasta-ref FILE
          Initial FASTA reference (otherwise a seq of N's is used).

       -s, --stats-in FILE
          Input statistics.

   Output options:
       -F, --fasta-cons FILE Print consensus in FASTA.

       -S, --stats-out FILE
          Export statistics to a file.

       -V, --vcf-cons FILE
          Print inferred variants in VCF (- for standard output).

       -P, --pileup FILE
          Print SAMtools pileup (- for standard output).

       --verbose
          Use the verbose mode (report every update of a counter).

   Parameters for consensus calling:
       -x, --counters STR
          Counter configuration [ococo32].

          configuration   bits/counter   bits/position
          ococo16         3          16
          ococo32         7          32
          ococo64         15         64

       -m, --mode STR
          Mode [batch].

          mode    description
          real-time   updates reported immediately
          batch   updates reported after end of algn stream

       -q, --min-MQ INT
          Skip alignments with mapping quality smaller than INT [1].

       -Q, --min-BQ INT
          Skip bases with base quality smaller than INT [13].

       -w, --ref-weight INT
          Initial counter value for nucleotides from ref [0].

       -c, --min-cov INT
          Minimum coverage required for update [2].

       -M, --maj-thres FLOAT
          Majority threshold [0.51].

Issues

Please use Github issues.

Changelog

See Releases.

Licence

MIT

Author

Karel Brinda \kbrinda@hsph.harvard.edu\