vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.09k stars 193 forks source link

SV vcf file format requested? #3309

Open jun3234 opened 3 years ago

jun3234 commented 3 years ago

I want to construct a graph from reference and SV vcf called with sniffles.

I just keep one variant record as follow:

Chr1    295261  17  GTCAACTGGCTGCTGGCGCAGTGGTAGCGCCAGCAGCCAGCCCTGCCTCCCTTTGTAAGGGGGCAGGGTTCGACACCCTATATGTATATATATCTATATATAATATAATATATCTATAATAGTAATTGTTGTCCCACCATTTATGTGGGTAGTTCTCCCACCCTCGTGTGGGTTGTTTCTAGTTCTCTCCCAATTGGGAGTAGTCTTCTAAATGTCTATCTCGGTTGGTCCCGGCTCTGTCCCAATTTTGTACTCTTTCT    N   .   PASS    SUPP=2;SUPP_VEC=11;SVLEN=-260;SVTYPE=DEL;SVMETHOD=SURVIVOR1.0.7;CHR2=Chr1;END=295521;CIPOS=0,0;CIEND=0,0;STRANDS=+- GT:PSV:LN:DR:ST:QV:TY:ID:RAL:AAL:CO 1/1:NA:260:0,27:+-:.:DEL:17:TGTCAACTGGCTGCTGGCGCAGTGGTAGCGCCAGCAGCCAGCCCTGCCTCCCTTTGTAAGGGGGCAGGGTTCGACACCCTATATGTATATATATCTATATATAATATAATATATCTATAATAGTAATTGTTGTCCCACCATTTATGTGGGTAGTTCTCCCACCCTCGTGTGGGTTGTTTCTAGTTCTCTCCCAATTGGGAGTAGTCTTCTAAATGTCTATCTCGGTTGGTCCCGGCTCTGTCCCAATTTTGTACTCTTT:N:Chr1_295261-Chr1_295521   1/1:NA:260:0,16:+-:.:DEL:17:GTCAACTGGCTGCTGGCGCAGTGGTAGCGCCAGCAGCCAGCCCTGCCTCCCTTTGTAAGGGGGCAGGGTTCGACACCCTATATGTATATATATCTATATATAATATAATATATCTATAATAGTAATTGTTGTCCCACCATTTATGTGGGTAGTTCTCCCACCCTCGTGTGGGTTGTTTCTAGTTCTCTCCCAATTGGGAGTAGTCTTCTAAATGTCTATCTCGGTTGGTCCCGGCTCTGTCCCAATTTTGTACTCTTTCT:N:Chr1_295261-Chr1_295521

And used this command, vg construct -r Lchinesis_genome.Chr.fasta -v test.vcf -S. Then, it crashed with this info:

Variant: Chr1   295261  17      GTCAACTGGCTGCTGGCGCAGTGGTAGCGCCAGCAGCCAGCCCTGCCTCCCTTTGTAAGGGGGCAGGGTTCGACACCCTATATGTATATATATCTATATATAATATAATATATCTATAATAGTAATTGTTGTCCCACCATTTATGTGGGTAGTTCTCCCACCCTCGTGTGGGTTGTTTCTAGTTCTCTCCCAATTGGGAGTAGTCTTCTAAATGTCTATCTCGGTTGGTCCCGGCTCTGTCCCAATTTTGTACTCTTTCT N       0       PASSCHR2=Chr1;CIEND=0,0;CIPOS=0,0;END=295521;STRANDS=+-;SUPP=2;SUPP_VEC=11;SVLEN=-260;SVMETHOD=SURVIVOR1.0.7;SVTYPE=DEL
zero ind: 295260 1-indexed: 295261

If I was altered coordinate from 295261 to 295262, it would run successfully.

Could you tell me details about SV vcf file format or any helpful links. Thank you in advance !

glennhickey commented 3 years ago

That message is saying that the REF allele in your vcf doesn't match your FASTA, apparently due to an off-by-1 error in your VCF.

The full error message you got should be:

error:[vg::Constructor] Variant/reference sequence mismatch: GTCAACTGGCTGCTGGCGCAGTGGTAGCGCCAGCAGCCAGCCCTGCCTCCCTTTGTAAGGGGGCAGGGTTCGACACCCTATATGTATATATATCTATATATAATATAATATATCTATAATAGTAATTGTTGTCCCACCATTTATGTGGGTAGTTCTCCCACCCTCGTGTGGGTTGTTTCTAGTTCTCTCCCAATTGGGAGTAGTCTTCTAAATGTCTATCTCGGTTGGTCCCGGCTCTGTCCCAATTTTGTACTCTTTCT vs pos: 295261: "<what it expects given your fasta>"
Variant: Chr1   295261  17      GTCAACTGGCTGCTGGCGCAGTGGTAGCGCCAGCAGCCAGCCCTGCCTCCCTTTGTAAGGGGGCAGGGTTCGACACCCTATATGTATATATATCTATATATAATATAATATATCTATAATAGTAATTGTTGTCCCACCATTTATGTGGGTAGTTCTCCCACCCTCGTGTGGGTTGTTTCTAGTTCTCTCCCAATTGGGAGTAGTCTTCTAAATGTCTATCTCGGTTGGTCCCGGCTCTGTCCCAATTTTGTACTCTTTCT N       0       PASSCHR2=Chr1;CIEND=0,0;CIPOS=0,0;END=295521;STRANDS=+-;SUPP=2;SUPP_VEC=11;SVLEN=-260;SVMETHOD=SURVIVOR1.0.7;SVTYPE=DEL
zero ind: 295260 1-indexed: 295261

You can check maually by running `samtools faidx Chr1:295261-295521 to see the expected allele.

There's probably a better way, but you can check your vcf is valid by running it through bcftools norm -f -- it should give you a similar error.

jun3234 commented 3 years ago

VCF file are 1-based format. I havn't changed VCF since it's called with sniffles. It's weird that there is off-by-1 error in VCF file.

thanks !