luntergroup / octopus

Bayesian haplotype-based mutation calling
MIT License
299 stars 37 forks source link

question: trio mode for larger families #209

Closed brentp closed 2 years ago

brentp commented 2 years ago

Hi Daniel, given a quad, would octopus perform the trio calling for each trio with, e.g.:

$ octopus -R $ref -I bro.bam sis.bam dad.bam mom.bam -M mom -F dad

? Or should I send that as two separate octopus runs, once for each trio? thanks for the great software!

dancooke commented 2 years ago

Hi Brent, unfortunately quad's are not supported by the trio model. Your options are:

brentp commented 2 years ago

Thanks for the reply. I'll do trio mode 2x for now as you suggest. Will update to family mode when/if it gets in the release.

Will the trio mode change the genotypes or genotype-qualities relative to population mode called on a trio? Or is it "just" assigning a posterior for the denovo?

brentp commented 2 years ago

And one more question. I'm planning to call trios, where possible and families otherwise, then combine across families with the n+1 example you describe here. How far will that scale? Could that work for 100 samples? 1000? 10,000? Thanks again.

dancooke commented 2 years ago

Will the trio mode change the genotypes or genotype-qualities relative to population mode called on a trio?

The trio mode has an inheritance-based model different to that used in the population model, so you'll get different genotype posteriors. Both models are presented in the methods section of the paper.

I'm planning to call trios, where possible and families otherwise, then combine across families with the n+1 example you describe here. How far will that scale? Could that work for 100 samples? 1000? 10,000?

It's not something I've tested, but I'd expect 1000's of samples to work with default settings. If you find this isn't the case then feel free to open a new ticket and I'll look into it.

brentp commented 2 years ago

Thank you. I will give it a go. And that brings another question:

is there utility in using --bamout ... --bamout-type FULL in the first octopus run for each sample (whether it's via trio mode or not)and then using that realigned bam as input to the later octopus runs (with groups of ~20 as prescribed in the docs).

dancooke commented 2 years ago

is there utility in using --bamout ... --bamout-type FULL in the first octopus run for each sample (whether it's via trio mode or not)and then using that realigned bam as input to the later octopus runs

No, since reads are always realigned internally and de novo variant discovery is disabled with the --disable-denovo-variant-discovery option (if this wasn't the case then there may be some benefit due to fewer spurious candidates from misaligned reads).

brentp commented 2 years ago

understood. cheers.