google / deepvariant

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
BSD 3-Clause "New" or "Revised" License
3.24k stars 729 forks source link

Deepvariant for Structural Variants #298

Closed arturo-opsetmoen-amador closed 4 years ago

arturo-opsetmoen-amador commented 4 years ago

Hi,

Would it be possible to use Deepvariant's training pipeline to create a model that is able to call larger variants (SVs), or is there something that would fundamentally limit the use of Deepvariant's algorithm for this case?

Thanks a lot in advance for any comments!

MariaNattestad commented 4 years ago

Hi @digitalemerge

There isn't technically anything stopping the current DeepVariant models from calling structural variants, and in fact we have seen this happen, especially with PacBio reads.

The main limitation is that the current way DeepVariant identifies variants is by looking within the read alignment signatures. SVs won't usually be captured within each short read, which is why most SV callers use split read or discordant pair signatures, something DeepVariant doesn't do because it was designed for calling small variants. In long reads, DeepVariant actually does capture some larger insertions and deletions as a natural extension of calling small indels, but it isn't perfect because SVs generally don't show up as neatly in the reads as small variants do. That is why dedicated SV callers like pbsv have methods built-in to evaluate evidence from reads that don't match perfectly.

That being said, we are exploring some strategies and experimenting with how we might extend DeepVariant to call structural variants.

Of course, you and anyone else out there who is interested in experimenting with DeepVariant should also feel free to do so!

Maria