varfish-org / varfish-annotator

(Legacy) Annotate variants for import into VarFish server.
MIT License
6 stars 3 forks source link
genetics varfish variant-filtration variant-prioritization vcf

Continuous Integration Coverage Status install with bioconda

VarFish Annotator

Annotation of VCF file for import into VarFish (through Web UI).

Supported Databases

Supported Variant Callers

VarFish annotator uses HTSJDK for reading variant call format (VCF) files. HTSJDK supports reading VCF v4.3 so the output of any tool that produces well-formed VCF can be read. VCF itself only specifies relatively few required fields and callers may use fields in a slightly different way. We thus document below what fields are used/interpreted by VarFish Annotator to prepare the files for VarFish.

Small Variants

The following fields are considered:

Structural Variants / Copy Number Variants

Supported Callers and Caller Annotation

The following variant callers are explicitely supported.

In the other cases, VarFish annotator will fall back to a "generic" import where only the per-sample fields GT, FT, and GQ are interpreted. Your caller should also write out INFO/END, INFO/SVTYPE, and INFO/SVLEN as defined by VCF4.2

VarFish Annotator will look at the field INFO/SVMETHOD to annotate calls with the caller where the call originated from. If this field is empty then you should define --default-sv-method so you get appropriately labeled output. If you have any problem with your data then please tell us by opening a GitHub issue.

Interpretation of top-level and INFO VCF fields

The following fields are considered:

Interpretation of FORMAT and per sample fields

Example

The following will create varfish-annotator-db-1906.h2.db and fill it.

# DOWNLOAD=path/to/varfish-db-downloader
# ANNOTATOR_VERSION=0.9
# ANNOTATOR_DATA_RELEASE=1907
# java -jar varfish-annotator-cli-$ANNOTATOR_VERSION-SNAPSHOT.jar \
      init-db \
      --db-release-info "varfish-annotator:v$ANNOTATOR_VERSION" \
      --db-release-info "varfish-annotator-db:r$ANNOTATOR_DATA_RELEASE" \
      \
      --ref-path /fast/projects/cubit/18.12/static_data/reference/GRCh37/hs37d5/hs37d5.fa \
      \
      --db-release-info "clinvar:2019-02-20" \
      --clinvar-path $DOWNLOAD/GRCh37/clinvar/latest/clinvar_tsv_main/output/clinvar_allele_trait_pairs.single.b37.tsv \
      --clinvar-path $DOWNLOAD/GRCh37/clinvar/latest/clinvar_tsv_main/output/clinvar_allele_trait_pairs.multi.b37.tsv \
      \
      --db-path ./varfish-annotator-db-$ANNOTATOR_DATA_RELEASE \
      \
      --db-release-info "exac:r1.0" \
      --exac-path $DOWNLOAD/GRCh37/ExAC/r1/download/ExAC.r1.sites.vep.vcf.gz \
      \
      --db-release-info "gnomad_exomes:r2.1" \
      $(for path in $DOWNLOAD/GRCh37/gnomAD_exomes/r2.1/download/gnomad.exomes.r2.1.sites.chr*.normalized.vcf.bgz; do \
          echo --gnomad-exomes-path $path; \
      done) \
      \
      --db-release-info "gnomad_genomes:r2.1" \
      $(for path in $DOWNLOAD/GRCh37/gnomAD_genomes/r2.1/download/gnomad.genomes.r2.1.sites.chr*.normalized.vcf.bgz; do \
          echo --gnomad-genomes-path $path; \
      done) \
      \
      --db-release-info "thousand_genomes:v3.20101123"
      $(for path in $DOWNLOAD/GRCh37/thousand_genomes/phase3/ALL.chr*.phase3_shapeit2_mvncall_integrated_v5a.20130502.sites.vcf.gz; do \
          echo --thousand-genomes-path $path; \
      done) \
      \
      --db-release-info "hgmd_public:ensembl_r75" \
      --hgmd-public $DOWNLOAD/GRCh37/hgmd_public/ensembl_r75/HgmdPublicLocus.tsv

Formatting Source Code

# mvn com.coveo:fmt-maven-plugin:format -Dverbose=true

Tests

The folder /tests contains some data sets that are appropriate for system (aka "end-to-end") tests of the software.

You can build the data sets with the build.sh script that is available in each folder. This script also serves for documenting the test data's provenance. The Jannovar software must be available as jannovar (e.g., through bioconda) on your PATH and you will need samtools.

Using JDK >=18

The tests use junit5-system-exit for detecting System.exit() calls. In JDK 18 you have to use the -Djava.security.manager=allow flag. Issue tginsberg/junit5-system-exit#10 tracks this issue.

Developing on Windows

There is an issue with removing temporary directories on Windows. Apparently, HTSJDK does not properly close files. Set -Djunit.jupiter.tempdir.cleanup.mode.default=NEVER to work around this issue.