nhansen / SVanalyzer

Tools for the analysis of structural variation in genomes
http://svanalyzer.readthedocs.io/
Other
76 stars 14 forks source link

A question about SVwiden #11

Open dxkeac opened 3 years ago

dxkeac commented 3 years ago

I want to use SVwiden to expand my VCF file, and I have ~9000 SVs, but the result file widened.vcf only got ~1000 SVs. The log file reported 2020/11/17 17:09:12 2020/11/17 17:09:12 Ref derived from widened ref has different allele from ref ! Are these Allele ignored, so I only got ~1000 SVs ?

nhansen commented 3 years ago

Thanks for reporting this error. It looks like the program is dying on a problematic SV, so the ~1000 SVs you got were most likely the first ones in your input file. Is the last line truncated in the output file? Can you post the SV entries just after the last one reported in your widened.vcf file, as well as the length of that particular reference entry?

I'm guessing this is a bug in how SVwiden handles SVs near the beginning/end of reference entries, but it would be nice to confirm that if possible. In the meantime, I'll be working on a fix. Thanks again for sharing this info.

dxkeac commented 3 years ago

It is likely that ref base of input.vcf (assembly) is different from reference genome file in the corresponding position.

  1. The last two lines of widened.vcf are as follow: chr19 5365627 paftools_D6_F7_5970 c cttctctctctctctctttctttctctttctttcttcttctcttctctttc 1 60 CHR2=chr19;END=5365627;SVTYPE=INS;SVLEN=50;IDLIST=paftools_D6_F7_5970;REPTYPE=DUP;BREAKSIMLENGTH=2;REFWIDENED=chr19:5365625-5365629 GT ./. chr19 15779323 paftools_D6_F7_6031 a agaggagtgagtgagtgaggagaggagtgagtgagtgaggaggggagtgagtgagtgaggaggggagtgagtgaggaggggagtgagtgagtgaggaggggagtgagtgagtgaggaggggagtgagtgagtgaggagaggagtgcgtgagtgaggagaggagtgagtgagtgcatgagtagaggagtgagtgagtgaggagaggagtgagtgagtgaggagaggagtgagtgagtgaggagaggagtgagtgagtgaggagaggagtgcgtgagtgaggagaggagtgagtgagtgcatgggtagaggagtgagtgagtgcatgggtagaggagtgagtgagtgaggaggagtgagtgagtgaggaggggagtgagtgagtgaggaggggagtgagtgagtgaggaggggagtgagtgagtgaggaggggagtgagtgagtgaggaggggagtgagtgagtgaggaggggagtgagtgagtgaggaggggagtgcgtgagtgagg 1 60 CHR2=chr19;END=15779323;SVTYPE=INS;SVLEN=505;IDLIST=paftools_D6_F7_6031;REPTYPE=SIMPLEINS;BREAKSIMLENGTH=318;REFWIDENED=chr19:15779239-15779556 GT ./.

  2. The last two lines of widened.log are as follow: 2020/11/17 17:15:52 Ref derived from widened ref has different allele from ref! 2020/11/17 17:15:52 Ref derived from widened ref has different allele from ref!

  3. The stdout is as follow: QUERY2 8825 doesn't match 8824.

I hope this information can be helpful.