Closed ikarus97 closed 4 years ago
Any chance of you being able to provide sub-BAMs for the trio around the variant that you mention?
Hi @dancooke , I'll be happy to provide the bams. It's about the size of ~2G. How can I provide it?
@ikarus97 If you give me your email address then I can invite you to my Dropbox folder. Send an email to my address (dcooke@well.ox.ac.uk) if you don't want to post your address on here.
Hi @dancooke, I've put the files & a note to your Dropbox folder. Thank you & Happy new year!
I'm struggling to see any problems here. It's to be expected that some variants will be called in the individual
mode (merged), or even in the population
mode, that are Mendelian errors but are not called as DENOVO
in trio
mode. These are likely to be false calls that the trio
model is better able to distinguish since it is more specific overall. I'd expect that most of these variants will low quality (i.e. not PASS
) though, which seems to be the case in your example. Can you point to a specific example that you believe to be a real de novo that the trio
calling model fails to call?
As for the BAM output, I don't see a problem either (or I don't understand your concern). Here's an IGV screenshot from the realigned BAMs you sent. What do you mean that "almost every reads in the region is from father"?
@dancooke
The variant we were interested in was 12:49440550 C>A variant (child-father-mother, from top)
One of my issue with BAM output is solved: Initially my bam files for trio were in different folders but under the same filename (bamout.bam).
octopus --threads 40 -R ref.fasta -I child/bamout.bam mother/bamout.bam father/bamout.bam --caller trio --organism-ploidy 2 -M mother -F father -p child:X=1 --sequence-error-model 10X --allow-marked-duplicates --allow-secondary-alignments --allow-supplementary-alignments --ignore-unmapped-contigs -o trio_mode -T 12:49430000-49450000 --bamout trio_mode.bam --full-bamout
After the trio run, there was only one BAM file (named “bamout.bam”), which contained only reads from father’s bam (last bam listed in command-line). When I re-run the same command with renamed BAMs (child.bam, mother.bam, father.bam; cut from original bam files),
octopus --threads 40 -R ref.fasta -I child.bam mother.bam father.bam --caller trio --organism-ploidy 2 -M mother -F father -p child:X=1 --sequence-error-model 10X --allow-marked-duplicates --allow-secondary-alignments --allow-supplementary-alignments --ignore-unmapped-contigs -o trio_mode -T 12:49430000-49450000 --bamout trio_mode.bam --full-bamout
it produced three bam files, each containing reads from the input bam files. I guess the bamout option produces bam files for each input bam file without changing the name. In my initial run, the bamout was overwritten for each input bam, since my bam files were all in the same name.
Have you experimentally validated this mutation? It looks a little suspicious to me: the variant allele frequency is pretty low and there's minor strand bias. It looks more likely to be a somatic mutation than an inherited de novo mutation. Are these samples derived from cell lines?
The variant is not called as there's insufficient support to overcome the prior. Probably the best way to get this called is by tweaking the algorithm parameters to increase sensitivity. For example, adding --denovo-snv-prior 1e-7 --min-denovo-posterior 0.1 --min-variant-posterior 0.1
to your command sees the variant called (albeit with very low quality scores). Note that by doing this you're potentially exposing yourself to more false positive calls.
Closing due to inactivity. Please re-open if needed.
Hello,
I ran octopus in trio mode for 3 linked-read whole genomes, and found confusing result in bam output.
A variant in child was seen in octopus by single-sample run. And it was not seen in parents from their respective single-sample runs. But, in trio mode, the variant was not even included in the VCF file.
Also, the bam output mostly consisted of reads from father bam, very few reads from child.
I installed octopus in my environment using bioconda. Octopus version is 0.6.3-beta (Boost: 1_70). The command I used for trio mode:
octopus --threads 40 -R reference.fasta -I child.bam mother.bam father.bam --caller trio --organism-ploidy 2 -M mother -F father -p father:X=1 --sequence-error-model 10X --allow-marked-duplicates --allow-secondary-alignments --allow-supplementary-alignments --ignore-unmapped-contigs -o out.vcf -T 12:49430000-49450000 --bamout out.bam.dir --full-bamout
For single-sample run on the same target region, I used:
octopus --threads 40 -R reference.fasta -I child.bam --organism-ploidy 2 --sequence-error-model 10X --allow-marked-duplicates --allow-secondary-alignments --allow-supplementary-alignments --ignore-unmapped-contigs -o out.vcf -T 12:49430000-49450000 --bamout out.bam --full-bamout
My question is: 1) How the bam output in trio mode generated from the three input bams? Why, in my case, almost every reads in the region is from father? 2) Is it possible that not every variant in child be reported in trio-mode VCF? But in my case, the variant is likely to be de novo.
Thank you.