monarch-initiative / SvAnna

Efficient and accurate pathogenicity prediction for coding and regulatory structural variants in long-read genome sequencing
32 stars 4 forks source link

Don't crash with invalid variants #164

Closed pnrobinson closed 3 years ago

pnrobinson commented 3 years ago

The following variant

231912_hifi_2smrtcell_hg38_pbmm2_sniffles.vcf 
CM000673.2  70371476    15322   N   <INS>   .   PASS    IMPRECISE;SVMETHOD=Snifflesv1.0.12;CHR2=CM000673.2;END=70371475;ZMW=6;STD_quant_start=93.890007;STD_quant_stop=1096.165134;Kurtosis_quant_start=-1.833883;Kurtosis_quant_stop=-1.998785;SVTYPE=INS;SUPTYPE=AL;SVLEN=2187;STRANDS=+-;STRANDS2=7,3,7,3;RE=6;REF_strand=0,0;Strandbias_pval=1;AF=1 GT:DR:DV    1/1:0:6

leads to this

org.monarchinitiative.svart.InvalidCoordinatesException: Fully-closed coordinates 11:70371477-70371475 must have a start position at most one place past the end position
    at org.monarchinitiative.svart.Coordinates.validateCoordinates(Coordinates.java:201)
    at org.monarchinitiative.svart.BaseGenomicRegion.<init>(BaseGenomicRegion.java:23)
    (...)
at org.jax.svanna.io.parse.VcfVariantParser.parseSymbolicVariantAllele(VcfVariantParser.java:234)
(...)
    at picocli.CommandLine.execute(CommandLine.java:2058)
    at org.jax.svanna.cli.Main.main(Main.java:53)

It is not good to let svanna crash with variants like this -- better to log the error and skip the variant.

pnrobinson commented 3 years ago

Adding this to VcfVariantParser, lin e233, would solve the issue, but perhaps there are better ways?

try {
     return Optional.of(builder.variantCallAttributes(variantCallAttributes).build());
     } catch (Exception e) {
     return Optional.empty();
}
ielis commented 3 years ago

Yeah, this error is unfortunate. The coordinates describe a region on reference with length 0, but the REF base is N (length 1). I do not think this is a correct VCF record. We should log a warning and drop the variant.

Could you please push the changes if you already have them?

pnrobinson commented 3 years ago

I added the correction for the malformed vars to the svg PR

Please not this in SvSvgGenerator (after possibly acting on this, the PR should be ready)

/**
     * TODO -- It would be better to put logic like this where the repeats are extracted!
     * @param variant
     * @param repeats
     * @return
     */
    List<RepetitiveRegion> getOverlappingRepeats(Variant variant, List<RepetitiveRegion> repeats) {
ielis commented 3 years ago

I will address the RR retrieval later. Otherwise, the issue with invalid variants should be fixed