luntergroup / octopus

Bayesian haplotype-based mutation calling
MIT License
299 stars 37 forks source link

BQSR recommendation #191

Closed sivico26 closed 3 years ago

sivico26 commented 3 years ago

Hi, thanks for developing Octopus,

The last time I did some variant calling I used GATK4 and tried to follow the best I could their recommended best practices. I remember especially the emphasis on the Base Quality Score Recalibration (BQSR) preprocessing, which was a bit cumbersome given that I had a non-model sample and then had to make the process in an iterative fashion.

Anyway, I want to give Octopus a try and I wonder what you think about BQSR. I read the Octopus documentation and search on the issues and apparently there no mention of the topic around (That might be a signal of the answer, but I wanted to ask anyway). Do you think it is either necessary or recommended? Consider that, in this case, there is not a reliable truth set of variants available. I should say that I have two sequencing runs from the same organism and the biases that this might introduce is one of the things that BQSR aims to address. It is the same sequencing instrument though.

Finally, an additional smaller question if you do not mind: My data is a PCR-free library from an Illumina ISeq instrument, which is not listed among the Octopus' error models. Which configuration should I use or what do you recommend in that regard?

Thank you in advance. Regards

dancooke commented 3 years ago

Hi, thanks for your interest in Octopus. We do not specifically recommend BQSR - I've not seen convincing evidence showing that it systematically improves accuracy. In case you already have alignments processed with BQSR, it's probably not going to hurt calling with Octopus.

Regarding the error model. Indeed there's no ISeq model right now. Without having seen any data - I think it's unlikely that the ISeq sequencer will have substantially different error profiles to other Illumina machines, at least in comparison to the PCR status of the sample, so you're probably not going to loose a great deal by just using the default PCR-free model.

sivico26 commented 3 years ago

Excellent, thanks for the quick answer and keep with the good job.

Cheers, Sivico