luntergroup / octopus

Bayesian haplotype-based mutation calling
MIT License
301 stars 37 forks source link

de novo mutation detection using ref from closely related species #159

Closed JohnsonStev closed 3 years ago

JohnsonStev commented 3 years ago

Hi,

Thank you for creating this awesome tool!

I have a question that is not directly associated with octopus, but it will be good to hear some feedback from experts.

I have trio data from one species, but the reference genome I have is from another species diverged ~2 million years.

I am wondering how sites that are 1/1 in genotype in both parents affect the trio-based model.

Will you recommend creating a consensus genome first based on variants called from parents, or even perform a reference-guided assembly.

Thank you.

dancooke commented 3 years ago

Hi, this shouldn't be a problem in principle - Octopus can call multi-allelic sites and reversions. However, you may run into performance issues in regions that are highly diverged because: a) read mapping will likely be poor, and b) there will be many candidate variant sites to consider.

It's difficult to say whether you should go down the route of generating a new reference as it depends on a number of factors. For starters you probably won't gain much if you're just using the same data for assembly. Then you need to think about how to interpret and report results with the new custom reference, especially the existing reference is the norm and has been annotated etc. I'd see how you get on with the current reference before embarking on creating a new one.