luntergroup / octopus

Bayesian haplotype-based mutation calling
MIT License
302 stars 38 forks source link

Bam output in trio mode #104

Closed ikarus97 closed 4 years ago

ikarus97 commented 4 years ago

Hello,

I ran octopus in trio mode for 3 linked-read whole genomes, and found confusing result in bam output.

A variant in child was seen in octopus by single-sample run. And it was not seen in parents from their respective single-sample runs. But, in trio mode, the variant was not even included in the VCF file.

Also, the bam output mostly consisted of reads from father bam, very few reads from child.

I installed octopus in my environment using bioconda. Octopus version is 0.6.3-beta (Boost: 1_70). The command I used for trio mode: octopus --threads 40 -R reference.fasta -I child.bam mother.bam father.bam --caller trio --organism-ploidy 2 -M mother -F father -p father:X=1 --sequence-error-model 10X --allow-marked-duplicates --allow-secondary-alignments --allow-supplementary-alignments --ignore-unmapped-contigs -o out.vcf -T 12:49430000-49450000 --bamout out.bam.dir --full-bamout

For single-sample run on the same target region, I used: octopus --threads 40 -R reference.fasta -I child.bam --organism-ploidy 2 --sequence-error-model 10X --allow-marked-duplicates --allow-secondary-alignments --allow-supplementary-alignments --ignore-unmapped-contigs -o out.vcf -T 12:49430000-49450000 --bamout out.bam --full-bamout

My question is: 1) How the bam output in trio mode generated from the three input bams? Why, in my case, almost every reads in the region is from father? 2) Is it possible that not every variant in child be reported in trio-mode VCF? But in my case, the variant is likely to be de novo.

Thank you.

dancooke commented 4 years ago

Any chance of you being able to provide sub-BAMs for the trio around the variant that you mention?

ikarus97 commented 4 years ago

Hi @dancooke , I'll be happy to provide the bams. It's about the size of ~2G. How can I provide it?

dancooke commented 4 years ago

@ikarus97 If you give me your email address then I can invite you to my Dropbox folder. Send an email to my address (dcooke@well.ox.ac.uk) if you don't want to post your address on here.

ikarus97 commented 4 years ago

Hi @dancooke, I've put the files & a note to your Dropbox folder. Thank you & Happy new year!

dancooke commented 4 years ago

I'm struggling to see any problems here. It's to be expected that some variants will be called in the individual mode (merged), or even in the population mode, that are Mendelian errors but are not called as DENOVO in trio mode. These are likely to be false calls that the trio model is better able to distinguish since it is more specific overall. I'd expect that most of these variants will low quality (i.e. not PASS) though, which seems to be the case in your example. Can you point to a specific example that you believe to be a real de novo that the trio calling model fails to call?

As for the BAM output, I don't see a problem either (or I don't understand your concern). Here's an IGV screenshot from the realigned BAMs you sent. What do you mean that "almost every reads in the region is from father"?

trio

ikarus97 commented 4 years ago

@dancooke

dancooke commented 4 years ago

Have you experimentally validated this mutation? It looks a little suspicious to me: the variant allele frequency is pretty low and there's minor strand bias. It looks more likely to be a somatic mutation than an inherited de novo mutation. Are these samples derived from cell lines?

The variant is not called as there's insufficient support to overcome the prior. Probably the best way to get this called is by tweaking the algorithm parameters to increase sensitivity. For example, adding --denovo-snv-prior 1e-7 --min-denovo-posterior 0.1 --min-variant-posterior 0.1 to your command sees the variant called (albeit with very low quality scores). Note that by doing this you're potentially exposing yourself to more false positive calls.

dancooke commented 4 years ago

Closing due to inactivity. Please re-open if needed.