Open sdjebali opened 5 years ago
This error suggests that there are some funny variants in your VCF. Do you have only SNPs and indels there, or are there SVs or other special records (like gVCF records)?
On Tue, Jul 16, 2019 at 5:48 PM Sarah Djebali notifications@github.com wrote:
Dear all,
I am trying to use vg construct to make a vgraph from a genome (bgzipped and samtools faidxed fasta file) and a variant file (gzipped and tabixed vcf file).
I am using vg version v1.17.0 "Candida" (pre-built for linux).
I am running this command on a slurm cluster with 4 threads and 32G of ram
vg construct -r GCA_002263795.2_ARS-UCD1.2_genomic.ensemblchrnames.fa.gz -v sample_merged2.vcf.gz -t 4 -S -p > ARS-UCD1.2.11runs.vg 2> ARS-UCD1.2.11runs.err
but it fails immediately and the error message is
vg: src/Variant.cpp:349: bool vcflib::Variant::canonicalize(FastaReference&, std::vector<FastaReference*>, bool, int): Assertion `canonicalizable()' failed. ERROR: Signal 6 occurred. VG has crashed. Run 'vg bugs --new' to report a bug. Stack trace path: /tmp/vg_crash_BPtZqn/stacktrace.txt
I have checked that I have the same number of sequences in the genome and in the vcf file (2211 sequences with same lengths), the only thing I see is that in the genome file the sequences are ordered according to numerical order while in the vcf file they are ordered according to alphabetical order?
Another issue could be that I have not compiled the code myself but used a pre-built executable?
What do you think?
Thanks, Sarah
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/2346?email_source=notifications&email_token=AABDQEI4Z5ZNKN7TVDNHB53P7XUWJA5CNFSM4IECNRR2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G7QJOGA, or mute the thread https://github.com/notifications/unsubscribe-auth/AABDQEJC75HBMUMJS7CR6NTP7XUWJANCNFSM4IECNRRQ .
No i do have SVs in this file since the variants were generated with sniffles on nanopore reads...
Le mar. 16 juil. 2019 à 18:12, Erik Garrison notifications@github.com a écrit :
This error suggests that there are some funny variants in your VCF. Do you have only SNPs and indels there, or are there SVs or other special records (like gVCF records)?
On Tue, Jul 16, 2019 at 5:48 PM Sarah Djebali notifications@github.com wrote:
Dear all,
I am trying to use vg construct to make a vgraph from a genome (bgzipped and samtools faidxed fasta file) and a variant file (gzipped and tabixed vcf file).
I am using vg version v1.17.0 "Candida" (pre-built for linux).
I am running this command on a slurm cluster with 4 threads and 32G of ram
vg construct -r GCA_002263795.2_ARS-UCD1.2_genomic.ensemblchrnames.fa.gz -v sample_merged2.vcf.gz -t 4 -S -p > ARS-UCD1.2.11runs.vg 2> ARS-UCD1.2.11runs.err
but it fails immediately and the error message is
vg: src/Variant.cpp:349: bool vcflib::Variant::canonicalize(FastaReference&, std::vector<FastaReference*>, bool, int): Assertion `canonicalizable()' failed. ERROR: Signal 6 occurred. VG has crashed. Run 'vg bugs --new' to report a bug. Stack trace path: /tmp/vg_crash_BPtZqn/stacktrace.txt
I have checked that I have the same number of sequences in the genome and in the vcf file (2211 sequences with same lengths), the only thing I see is that in the genome file the sequences are ordered according to numerical order while in the vcf file they are ordered according to alphabetical order?
Another issue could be that I have not compiled the code myself but used a pre-built executable?
What do you think?
Thanks, Sarah
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub < https://github.com/vgteam/vg/issues/2346?email_source=notifications&email_token=AABDQEI4Z5ZNKN7TVDNHB53P7XUWJA5CNFSM4IECNRR2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G7QJOGA , or mute the thread < https://github.com/notifications/unsubscribe-auth/AABDQEJC75HBMUMJS7CR6NTP7XUWJANCNFSM4IECNRRQ
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/2346?email_source=notifications&email_token=AATLUAPWZMFF4QQKIILJDS3P7XXOXA5CNFSM4IECNRR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2BLTWY#issuecomment-511883739, or mute the thread https://github.com/notifications/unsubscribe-auth/AATLUANFEIAC3MLT7VK5OY3P7XXOXANCNFSM4IECNRRQ .
I'm not familiar with the SV processing in construct. Maybe someone else can chime in.
The VCF SV format(s) are complex! I assume you want the SVs in the graph?
On Tue, Jul 16, 2019 at 6:15 PM Sarah Djebali notifications@github.com wrote:
No i do have SVs in this file since the variants were generated with sniffles on nanopore reads...
Le mar. 16 juil. 2019 à 18:12, Erik Garrison notifications@github.com a écrit :
This error suggests that there are some funny variants in your VCF. Do you have only SNPs and indels there, or are there SVs or other special records (like gVCF records)?
On Tue, Jul 16, 2019 at 5:48 PM Sarah Djebali notifications@github.com wrote:
Dear all,
I am trying to use vg construct to make a vgraph from a genome (bgzipped and samtools faidxed fasta file) and a variant file (gzipped and tabixed vcf file).
I am using vg version v1.17.0 "Candida" (pre-built for linux).
I am running this command on a slurm cluster with 4 threads and 32G of ram
vg construct -r GCA_002263795.2_ARS-UCD1.2_genomic.ensemblchrnames.fa.gz -v sample_merged2.vcf.gz -t 4 -S -p > ARS-UCD1.2.11runs.vg 2> ARS-UCD1.2.11runs.err
but it fails immediately and the error message is
vg: src/Variant.cpp:349: bool vcflib::Variant::canonicalize(FastaReference&, std::vector<FastaReference*>, bool, int): Assertion `canonicalizable()' failed. ERROR: Signal 6 occurred. VG has crashed. Run 'vg bugs --new' to report a bug. Stack trace path: /tmp/vg_crash_BPtZqn/stacktrace.txt
I have checked that I have the same number of sequences in the genome and in the vcf file (2211 sequences with same lengths), the only thing I see is that in the genome file the sequences are ordered according to numerical order while in the vcf file they are ordered according to alphabetical order?
Another issue could be that I have not compiled the code myself but used a pre-built executable?
What do you think?
Thanks, Sarah
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <
, or mute the thread <
https://github.com/notifications/unsubscribe-auth/AABDQEJC75HBMUMJS7CR6NTP7XUWJANCNFSM4IECNRRQ
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/vgteam/vg/issues/2346?email_source=notifications&email_token=AATLUAPWZMFF4QQKIILJDS3P7XXOXA5CNFSM4IECNRR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2BLTWY#issuecomment-511883739 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AATLUANFEIAC3MLT7VK5OY3P7XXOXANCNFSM4IECNRRQ
.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/2346?email_source=notifications&email_token=AABDQEI5QGCPHK5SR3RI4YTP7XXYVA5CNFSM4IECNRR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2BL3QI#issuecomment-511884737, or mute the thread https://github.com/notifications/unsubscribe-auth/AABDQELZIOUJJVJSB43UYGLP7XXYVANCNFSM4IECNRRQ .
Right but if it is not possible then it is fine... I am quite new to this field and was not sure the tool had this functionality...
Le mar. 16 juil. 2019 à 18:17, Erik Garrison notifications@github.com a écrit :
I'm not familiar with the SV processing in construct. Maybe someone else can chime in.
The VCF SV format(s) are complex! I assume you want the SVs in the graph?
On Tue, Jul 16, 2019 at 6:15 PM Sarah Djebali notifications@github.com wrote:
No i do have SVs in this file since the variants were generated with sniffles on nanopore reads...
Le mar. 16 juil. 2019 à 18:12, Erik Garrison notifications@github.com a écrit :
This error suggests that there are some funny variants in your VCF. Do you have only SNPs and indels there, or are there SVs or other special records (like gVCF records)?
On Tue, Jul 16, 2019 at 5:48 PM Sarah Djebali < notifications@github.com> wrote:
Dear all,
I am trying to use vg construct to make a vgraph from a genome (bgzipped and samtools faidxed fasta file) and a variant file (gzipped and tabixed vcf file).
I am using vg version v1.17.0 "Candida" (pre-built for linux).
I am running this command on a slurm cluster with 4 threads and 32G of ram
vg construct -r GCA_002263795.2_ARS-UCD1.2_genomic.ensemblchrnames.fa.gz -v sample_merged2.vcf.gz -t 4 -S -p > ARS-UCD1.2.11runs.vg 2> ARS-UCD1.2.11runs.err
but it fails immediately and the error message is
vg: src/Variant.cpp:349: bool vcflib::Variant::canonicalize(FastaReference&, std::vector<FastaReference*>, bool, int): Assertion `canonicalizable()' failed. ERROR: Signal 6 occurred. VG has crashed. Run 'vg bugs --new' to report a bug. Stack trace path: /tmp/vg_crash_BPtZqn/stacktrace.txt
I have checked that I have the same number of sequences in the genome and in the vcf file (2211 sequences with same lengths), the only thing I see is that in the genome file the sequences are ordered according to numerical order while in the vcf file they are ordered according to alphabetical order?
Another issue could be that I have not compiled the code myself but used a pre-built executable?
What do you think?
Thanks, Sarah
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <
, or mute the thread <
https://github.com/notifications/unsubscribe-auth/AABDQEJC75HBMUMJS7CR6NTP7XUWJANCNFSM4IECNRRQ
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <
, or mute the thread <
https://github.com/notifications/unsubscribe-auth/AATLUANFEIAC3MLT7VK5OY3P7XXOXANCNFSM4IECNRRQ
.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/vgteam/vg/issues/2346?email_source=notifications&email_token=AABDQEI5QGCPHK5SR3RI4YTP7XXYVA5CNFSM4IECNRR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2BL3QI#issuecomment-511884737 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AABDQELZIOUJJVJSB43UYGLP7XXYVANCNFSM4IECNRRQ
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/2346?email_source=notifications&email_token=AATLUANBT5R6P5BRVK45FH3P7XYA7A5CNFSM4IECNRR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2BMCDI#issuecomment-511885581, or mute the thread https://github.com/notifications/unsubscribe-auth/AATLUAL4EETRFUUG23FXUUTP7XYA7ANCNFSM4IECNRRQ .
Can you share your input data, or an example of an SV insertion line from your VCF? vg construct is fairly particular about which fields it wants for symbolic SV support. Also, for symbolic insertions, it will look for the sequence in a separate fasta file specified with -I.
On Tue, Jul 16, 2019 at 12:20 PM Sarah Djebali notifications@github.com wrote:
Right but if it is not possible then it is fine... I am quite new to this field and was not sure the tool had this functionality...
Le mar. 16 juil. 2019 à 18:17, Erik Garrison notifications@github.com a écrit :
I'm not familiar with the SV processing in construct. Maybe someone else can chime in.
The VCF SV format(s) are complex! I assume you want the SVs in the graph?
On Tue, Jul 16, 2019 at 6:15 PM Sarah Djebali notifications@github.com wrote:
No i do have SVs in this file since the variants were generated with sniffles on nanopore reads...
Le mar. 16 juil. 2019 à 18:12, Erik Garrison <notifications@github.com
a
écrit :
This error suggests that there are some funny variants in your VCF. Do you have only SNPs and indels there, or are there SVs or other special records (like gVCF records)?
On Tue, Jul 16, 2019 at 5:48 PM Sarah Djebali < notifications@github.com> wrote:
Dear all,
I am trying to use vg construct to make a vgraph from a genome (bgzipped and samtools faidxed fasta file) and a variant file (gzipped and tabixed vcf file).
I am using vg version v1.17.0 "Candida" (pre-built for linux).
I am running this command on a slurm cluster with 4 threads and 32G of ram
vg construct -r GCA_002263795.2_ARS-UCD1.2_genomic.ensemblchrnames.fa.gz -v sample_merged2.vcf.gz -t 4 -S -p > ARS-UCD1.2.11runs.vg 2> ARS-UCD1.2.11runs.err
but it fails immediately and the error message is
vg: src/Variant.cpp:349: bool vcflib::Variant::canonicalize(FastaReference&, std::vector<FastaReference*>, bool, int): Assertion `canonicalizable()' failed. ERROR: Signal 6 occurred. VG has crashed. Run 'vg bugs --new' to report a bug. Stack trace path: /tmp/vg_crash_BPtZqn/stacktrace.txt
I have checked that I have the same number of sequences in the genome and in the vcf file (2211 sequences with same lengths), the only thing I see is that in the genome file the sequences are ordered according to numerical order while in the vcf file they are ordered according to alphabetical order?
Another issue could be that I have not compiled the code myself but used a pre-built executable?
What do you think?
Thanks, Sarah
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <
, or mute the thread <
https://github.com/notifications/unsubscribe-auth/AABDQEJC75HBMUMJS7CR6NTP7XUWJANCNFSM4IECNRRQ
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <
, or mute the thread <
https://github.com/notifications/unsubscribe-auth/AATLUANFEIAC3MLT7VK5OY3P7XXOXANCNFSM4IECNRRQ
.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub <
, or mute the thread <
https://github.com/notifications/unsubscribe-auth/AABDQELZIOUJJVJSB43UYGLP7XXYVANCNFSM4IECNRRQ
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/vgteam/vg/issues/2346?email_source=notifications&email_token=AATLUANBT5R6P5BRVK45FH3P7XYA7A5CNFSM4IECNRR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2BMCDI#issuecomment-511885581 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AATLUAL4EETRFUUG23FXUUTP7XYA7ANCNFSM4IECNRRQ
.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/2346?email_source=notifications&email_token=AAG373X2PCI7VIAE3NTNWB3P7XYLPA5CNFSM4IECNRR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2BMKSA#issuecomment-511886664, or mute the thread https://github.com/notifications/unsubscribe-auth/AAG373TS46WQ6EQNRWJXE53P7XYLPANCNFSM4IECNRRQ .
Having a similar issue. I have a vcf containing SV called with sniffles (https://github.com/fritzsedlazeck/Sniffles). I have seperated out SV into different vcf files. One for deletions, insertions and inversions.
Inversions give errors on construct, see https://github.com/vgteam/vg/issues/2349 Deletions and insertions constuct without error.
I can sucessfully use view to make a gfa for insertions. When making a gfa for deletions I get a similar error
terminate called after throwing an instance of 'std::runtime_error'
what(): [vg::io::MessageIterator] obsolete, invalid, or corrupt input at message 5744260496672 group 5744260496668
ERROR: Signal 6 occurred. VG has crashed. Run 'vg bugs --new' to report a bug.
Stack trace path: /tmp/vg_crash_klVvSb/stacktrace.txt
Sure I have put my input data here http://genoweb.toulouse.inra.fr/~sdjebali/vg/bos_taurus/
Let me know if I can help further?
Best,
On Tue, Jul 16, 2019 at 6:55 PM Glenn Hickey notifications@github.com wrote:
Can you share your input data, or an example of an SV insertion line from your VCF? vg construct is fairly particular about which fields it wants for symbolic SV support. Also, for symbolic insertions, it will look for the sequence in a separate fasta file specified with -I.
On Tue, Jul 16, 2019 at 12:20 PM Sarah Djebali notifications@github.com wrote:
Right but if it is not possible then it is fine... I am quite new to this field and was not sure the tool had this functionality...
Le mar. 16 juil. 2019 à 18:17, Erik Garrison notifications@github.com a écrit :
I'm not familiar with the SV processing in construct. Maybe someone else can chime in.
The VCF SV format(s) are complex! I assume you want the SVs in the graph?
On Tue, Jul 16, 2019 at 6:15 PM Sarah Djebali < notifications@github.com> wrote:
No i do have SVs in this file since the variants were generated with sniffles on nanopore reads...
Le mar. 16 juil. 2019 à 18:12, Erik Garrison < notifications@github.com
a
écrit :
This error suggests that there are some funny variants in your VCF. Do you have only SNPs and indels there, or are there SVs or other special records (like gVCF records)?
On Tue, Jul 16, 2019 at 5:48 PM Sarah Djebali < notifications@github.com> wrote:
Dear all,
I am trying to use vg construct to make a vgraph from a genome (bgzipped and samtools faidxed fasta file) and a variant file (gzipped and tabixed vcf file).
I am using vg version v1.17.0 "Candida" (pre-built for linux).
I am running this command on a slurm cluster with 4 threads and 32G of ram
vg construct -r GCA_002263795.2_ARS-UCD1.2_genomic.ensemblchrnames.fa.gz -v sample_merged2.vcf.gz -t 4 -S -p > ARS-UCD1.2.11runs.vg 2> ARS-UCD1.2.11runs.err
but it fails immediately and the error message is
vg: src/Variant.cpp:349: bool vcflib::Variant::canonicalize(FastaReference&, std::vector<FastaReference*>, bool, int): Assertion `canonicalizable()' failed. ERROR: Signal 6 occurred. VG has crashed. Run 'vg bugs --new' to report a bug. Stack trace path: /tmp/vg_crash_BPtZqn/stacktrace.txt
I have checked that I have the same number of sequences in the genome and in the vcf file (2211 sequences with same lengths), the only thing I see is that in the genome file the sequences are ordered according to numerical order while in the vcf file they are ordered according to alphabetical order?
Another issue could be that I have not compiled the code myself but used a pre-built executable?
What do you think?
Thanks, Sarah
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <
, or mute the thread <
https://github.com/notifications/unsubscribe-auth/AABDQEJC75HBMUMJS7CR6NTP7XUWJANCNFSM4IECNRRQ
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <
, or mute the thread <
https://github.com/notifications/unsubscribe-auth/AATLUANFEIAC3MLT7VK5OY3P7XXOXANCNFSM4IECNRRQ
.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub <
, or mute the thread <
https://github.com/notifications/unsubscribe-auth/AABDQELZIOUJJVJSB43UYGLP7XXYVANCNFSM4IECNRRQ
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <
, or mute the thread <
https://github.com/notifications/unsubscribe-auth/AATLUAL4EETRFUUG23FXUUTP7XYA7ANCNFSM4IECNRRQ
.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub < https://github.com/vgteam/vg/issues/2346?email_source=notifications&email_token=AAG373X2PCI7VIAE3NTNWB3P7XYLPA5CNFSM4IECNRR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2BMKSA#issuecomment-511886664 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AAG373TS46WQ6EQNRWJXE53P7XYLPANCNFSM4IECNRRQ
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/2346?email_source=notifications&email_token=AATLUAMMSURM5K7WT5RV563P7X4QNA5CNFSM4IECNRR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2BPN3A#issuecomment-511899372, or mute the thread https://github.com/notifications/unsubscribe-auth/AATLUAPOHR3FGXO4WACCO3LP7X4QNANCNFSM4IECNRRQ .
--
Sarah Djebali - PhD INRA GenPhySE, ch. de Borderouge 31326 Castanet-Tolosan, France Tel. +33 5 61 28 51 22 sarah.djebali-quelen at inra dot fr
It's crashing on the first insertion in your VCF.
1 25619 INS000SUR N <INS> . PASS SUPP=11;SUPP_VEC=11111111111;AVGLEN=590;SVTYPE=INS;SVMETHOD=SURVIVORv2;CHR2=1;END=26210;CIPOS=0,0;CIEND=0,0;STRANDS=+-
To put an insertion in the graph, vg needs to know its sequence. Which it can't find from this record. Furthermore, it wants the length in the SVLEN field.
If I do something like make a dummy fasta file
cat ins000sur.fa
>INS000SUR
AAAAAAAAAAAAAAAAAAAA
Then add SVLEN to the record
1 25619 INS000SUR N <INS> . PASS SUPP=11;SUPP_VEC=11111111111;AVGLEN=590;SVTYPE=INS;SVMETHOD=SURVIVORv2;CHR2=1;END=26210;SVLEN=20;CIPOS=0,0;CIEND=0,0;STRANDS=+-
Then pass the fasta to construct with -I
vg construct -v test.vcf.gz -r GCA_002263795.2_ARS-UCD1.2_genomic.ensemblchrnames.fa -S -I ins000sur.fa
It runs through (for the first record -- you'd need to fix all insertions in this manner to build the graph).
Alternatively (and this is the method I prefer), you can just spell out your insertions as regular records in the VCF and not have to worry about adding tags and extra fasta files.
1 25619 INS000SUR G GAAAAAAAAAAAAAAAAAAAA . PASS SUPP=11;SUPP_VEC=11111111111;AVGLEN=590;SVTYPE=INS;SVMETHOD=SURVIVORv2;CHR2=1;END=26210;SVLEN=20;CIPOS=0,0;CIEND=0,0;STRANDS=+-
You can also use the SEQ
SVTAG to describe insertion sequences:
1 25619 INS000SUR G <INS> . PASS SUPP=11;SUPP_VEC=11111111111;AVGLEN=590;SVTYPE=INS;SVMETHOD=SURVIVORv2;CHR2=1;END=26210;SVLEN=20;CIPOS=0,0;CIEND=0,0;STRANDS=+-;SEQ=AAAAAAAAAAAAAAAAAAAA
As Glenn said, you need to have the SVLEN field filled out for insertions as well. Things should work as long as you've got an SVTYPE, SVLEN, and one of (SEQ, external fasta +
Thanks a lot for both answers, I will fix that in my vcf file Best
On Thu, Jul 18, 2019 at 5:03 PM Eric T. Dawson notifications@github.com wrote:
You can also use the SEQ SVTAG to describe insertion sequences:
1 25619 INS000SUR G . PASS SUPP=11;SUPP_VEC=11111111111;AVGLEN=590;SVTYPE=INS;SVMETHOD=SURVIVORv2;CHR2=1;END=26210;SVLEN=20;CIPOS=0,0;CIEND=0,0;STRANDS=+-;SEQ=AAAAAAAAAAAAAAAAAAAA
As Glenn said, you need to have the SVLEN field filled out for insertions as well. Things should work as long as you've got an SVTYPE, SVLEN, and one of (SEQ, external fasta +
in alt field, or ref/alt sequences in the VCF). — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/2346?email_source=notifications&email_token=AATLUANJPBJQLIZIZMYICNLQACA4FA5CNFSM4IECNRR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2IYV7Y#issuecomment-512854783, or mute the thread https://github.com/notifications/unsubscribe-auth/AATLUANOXXZVEF6KXR7VAATQACA4FANCNFSM4IECNRRQ .
--
Sarah Djebali - PhD INRA GenPhySE, ch. de Borderouge 31326 Castanet-Tolosan, France Tel. +33 5 61 28 51 22 sarah.djebali-quelen at inra dot fr
Dear all, just a small question: does vg construct need the SV sequence for all kinds of SVs or just for insertions? Thanks, Sarah
Dear all,
I am trying to use vg construct to make a vgraph from a genome (bgzipped and samtools faidxed fasta file) and a variant file (gzipped and tabixed vcf file).
I am using vg version v1.17.0 "Candida" (pre-built for linux).
I am running this command on a slurm cluster with 4 threads and 32G of ram
but it fails immediately and the error message is
I have checked that I have the same number of sequences in the genome and in the vcf file (2211 sequences with same lengths), the only thing I see is that in the genome file the sequences are ordered according to numerical order while in the vcf file they are ordered according to alphabetical order?
Another issue could be that I have not compiled the code myself but used a pre-built executable?
What do you think?
Thanks, Sarah