luntergroup / octopus

Bayesian haplotype-based mutation calling
MIT License
302 stars 38 forks source link

Getting number of reference and variant reads in cancer model #61

Closed programmingprincess closed 5 years ago

programmingprincess commented 5 years ago

Hi, I am running somatic variant calling with tumor-only samples. I am wondering how to get the number of reference vs. variant reads for each variant. I notice that some variants have a MAP_VAF value, while many others do not. How can we get reference vs. variant values for variants that do not have MAP_VAF, and what causes some of the variants to be missing this field?

Thanks in advance.

Command Command used to run Octopus:

octopus
-R ref.fasta \
-I sample1.bam sample2.bam sample3.bam  \
-t chr1.list \
-C cancer \
--very-fast \
--forest germline.v0.6.3-beta.forest \
--somatic-forest somatic.v0.6.3-beta.forest \
--threads 8 \
-o vcf

Desktop (please complete the following information):

dancooke commented 5 years ago

The variant allele frequency statistics are only reported for variants called SOMATIC - hence why some variants (those called in the germline) do not have the MAP_VAF or VAF_CR annotations. In principle, it could be that some of the called germline variants occur in copy-number change regions so do not have the expected allele-frequency, but Octopus does not explicitly call copy-number changes.

You can however get empirical variant allele depth and VAFs for all variant types using the --annotations option. Any measure that is used for filtering (all in the case of random forest filtering) can be requested. You might want to read my post here about these statistics though. In your case you probably want the AD, ADP, and AF measures (AF is AD / ADP). So your command will becomes:

octopus \
    -R ref.fasta \
    -I sample1.bam sample2.bam sample3.bam  \
    -t chr1.list \
    -C cancer \
    --very-fast \
    --forest germline.v0.6.3-beta.forest \
    --somatic-forest somatic.v0.6.3-beta.forest \
    --threads 8 \
    --annotations AD ADP AF \
    -o vcf