nf-cmgg / germline

A nextflow pipeline for calling and annotating small germline variants from short DNA reads for WES and WGS data
https://nf-cmgg.github.io/germline/
MIT License
6 stars 1 forks source link

Add normalization and decomposing to VCF2DB #143

Closed nvnieuwk closed 10 months ago

nvnieuwk commented 1 year ago

Description of feature

Add normalization and decomposing to VCF2DB

matthdsm commented 1 year ago

commands

gunzip -c sample-joint-gatk-haplotype-joint.vcf.gz \
| bcftools view -f 'PASS,.' \
| vcfallelicprimitives -t DECOMPOSED --keep-geno \
| sed 's/ID=AD,Number=./ID=AD,Number=R/' \
| vt decompose -s -  \
| vt normalize -n -r genome.fa -  \
| awk '{ gsub("./-65", "./."); print $0 }' | sed -e 's/Number=A/Number=1/g' \
| bgzip -c > sample-joint-gatk-haplotype-joint-decompose.vcf.gz
matthdsm commented 1 year ago

decompose and normalisation can also be done by bcftools if more convenient. I'm not really sure the other commands are still relevant.

nvnieuwk commented 1 year ago

Allright thanks! I'll have a look when I start working on this

nvnieuwk commented 10 months ago

Added in #150