mehrdadbakhtiari / adVNTR

A tool for genotyping Variable Number Tandem Repeats (VNTR) from sequence data
http://advntr.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
39 stars 15 forks source link
bioinformatics genomics genotype next-generation-sequencing structural-variation

install with bioconda Anaconda-Server Badge Documentation Status

adVNTR - A tool for genotyping VNTRs

adVNTR is a tool for genotyping Variable Number Tandem Repeats (VNTR) from sequence data. It works with both NGS short reads (Illumina HiSeq) and SMRT reads (PacBio) and finds diploid repeating counts for VNTRs and identifies possible mutations in the VNTR sequences.

code-adVNTR, a tool specialized in detecting small indel variants within motifs using short reads is now available. This tool employs multiple motif Hidden Markov Models (HMMs) to identify small variants within motifs and estimate diploid repeat counts for VNTRs specifically in coding regions. For more details, please refer to this readme.

Installation

If you are using the conda packaging manager (e.g. miniconda or anaconda), you can install adVNTR from the bioconda channel:

conda config --add channels bioconda
conda install -c conda-forge -c bioconda advntr

adVNTR could be invoked from command line with advntr

Alternatively, you can install dependencies and install the adVNTR from source.

Data Requirements

In order to genotype VNTRs, you need to either train models for loci of interest or use pre-trained models (recommended):

Alternatively, you can add model for custom VNTR. See Add Custom VNTR for more information about training models for custom VNTRs.

[Optional] For faster genotyping with adVNTR-NN, pretrained neural network models can be downloaded from here.

Execution:

Use following command to see the help for running the tool.

advntr --help

The program outputs the RU count genotypes of trained VNTRs. To specify a single VNTR by its ID use --vntr_id <id> option. The list of some known VNTRs and their ID is available at Disease-linked-VNTRs page in wiki.

See the demo below or Quickstart page to see an example data set with step-by-step genotyping commands.

Demo input in BAM format

    advntr genotype --alignment_file aligned_illumina_reads.bam --working_directory ./log_dir/
    advntr genotype --alignment_file aligned_pacbio_reads.bam --working_directory ./log_dir/ --pacbio
    advntr genotype --alignment_file aligned_illumina_reads.bam --working_directory ./log_dir/ --frameshift

Documentation:

Documentation is available at advntr.readthedocs.io.

See Quickstart page to see an example data set with step-by-step genotyping commands.

Citation: