nhansen / SVanalyzer

Tools for the analysis of structural variation in genomes
http://svanalyzer.readthedocs.io/
Other
76 stars 14 forks source link

SV in telomere #8

Closed proinde closed 4 years ago

proinde commented 4 years ago

According to the VCF specification v4.2 it is possible to record an SV in telomere, and the POS value for such an event is specified by either setting it equal to 0 or to N + 1 where N is the length of the reference sequence for that contig. Currently, your code skips entries where the SV is in the telomere.

nhansen commented 4 years ago

Thanks, Matthew. Can you propose a good rule for how to compare SVs with these specifications?

proinde commented 4 years ago

I think just allow for N + 1 or 0 as a position (correlating to each of the ends of the chromosome respectively). While processing my vcfs, I got a divide by zero error that I think might be related as well. Trying to see whether spoofing the positions back onto the chromosome fixes the divide by zero.

nhansen commented 4 years ago

We can allow the positions, but I'd prefer not to allow these SVs to merge with others, since they don't have sequence associated with them (just size, I assume?)

Can you post the divide-by-zero error you get?

Thanks again for posting.

proinde commented 4 years ago

I think that makes sense, although in this case there is sequence in the vcf file in the ALT field. The VCF is from sniffles, which provides that info.

chr22_KI270732v1_random:41544 is past end of chromosome--skipping comparison of 68878_0 and 0_-28365
Use of uninitialized value $max_shift in abs at SVmerge line 294.
Use of uninitialized value $shared_denominator in division (/) at SVmerge line 294.
Illegal division by zero at SVmerge line 294.

and the problematic entry is as follows. I removed all but the first and last 2 bases of the ALT sequence for readability:

chr22_KI270732v1_random 41544   0_-28365     N       GA<a_bunch_of_sequence_goes_here>AG .       UNRESOLVED     CHR2=chr22_KI270732v1_random;END=41544;RE=3;PRECISE;SVLEN=999;SVMETHOD=Snifflesv1.0.12;SVTYPE=INS;RNAMES=994e47dc-566a-4c4a-8960-5df95e8db790,cce8f5d6-a94e-497b-840a-f348942e139f,fd535588-7ca1-4448-9e75-f32e9346de8a;STRANDS2=2,1,2,1;REF_strand=0,0;Strandbias_pval=1.0;STD_quant_start=3.464102;STD_quant_stop=3.464102;Kurtosis_quant_start=0.0;Kurtosis_quant_stop=0.0;SUPTYPE=AL;STRANDS=+-;AF=1.0       GT:DR:DV        1/1:0:3
nhansen commented 4 years ago

I've committed a correction for the uninitialized value error, and will try eventually to incorporate a reasonable type of comparison for these SVs. Thanks again for bringing it to my attention.

proinde commented 4 years ago

Sure thing. Thanks!