monarch-initiative / SvAnna

Efficient and accurate pathogenicity prediction for coding and regulatory structural variants in long-read genome sequencing
34 stars 4 forks source link

Short read support #223

Closed lisosome closed 1 year ago

lisosome commented 2 years ago

Hello!

Thank you for this great tool!

It's possible to use SvAnna with short-read data VCF created by Manta?

Best Regards

ielis commented 1 year ago

Hi @lisosome SvAnna generally requires the input to be a valid VCF file. We tested the code with pbsv, sniffles and svim but it should work with Manta too as long as the input is valid VCF file.

SvAnna recognizes variants in all VCF notations.

Literal notation:

12  6124705 Othman-2010-20696945-VWF-index-FigS7    AAAAGGAAACAATG  A   1000    PASS    .   GT:DP:AD    0/1:10:5,5

Symbolic notation:

chr17   31150798    Hsiao-2015-26189818-NF1-UAB_1-FigS1 N   <DEL>   1000    PASS    SVTYPE=DEL;END=31157725;SVLEN=-6926 GT:DP:AD    0/1:10:5,5

as well as breakend notation:

chr3    11007013    Pesz-2018-29621621-SLC6A1-proband-FigS8 N   N[chr4:139383334[   1000    PASS    SVTYPE=BND  GT:DP:AD    0/1:10:5,5

To allow filtering by the number of supporting reads (--min-read-support option), SvAnna requires presence of specific VCF fields. The variant callers store this information in idiosyncratic way (See Table S1 in SvAnna manuscript). If the fields are missing, SvAnna will treat the variant as if it had passed the filtering step and retain the variants in the results.

So, it should generally run OK. However, please let me know if you run into any issues.