Open hgibling opened 8 years ago
Hi Heather,
Which version are you using? The duplicate nodes ought not to be an issue with the latest master (as of about 1 week ago). If you're on the latest, and can share your data and command line, please do!
The VCF ids are indeed node ids. The catch is that they are ids in the "augmented graph". vg call creates this intermediate graph by adding (and smoothing) novel stuff from the pileup into your input graph, splitting up nodes and changing most ids in the process. The non-reference parths of the augmented graph that meet the given thresholds of read support are then translated to VCF entries. You can get this graph with the -A option to vg call if you want to view it -- it will contain all the ids from the VCF.
Oh boy the readme link is indeed wrong. you have the right one! Sorry, will fix on my next PR.
cheers -Glenn
On Wed, Sep 7, 2016 at 5:03 PM, Heather Gibling notifications@github.com wrote:
I've mapped a bam file to my MHC graph and used vg call to produce a vcf file of variants, but I'm not sure how to interpret it. Many positions have multiple entries, and some of the IDs are duplicated. For example, position 30052145 has 23 entries (the first column is the number of line occurrences from grep 30052145 file.vcf | uniq -c):
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
7 . 30052145 837884_837817_837813_1060355_837806_837804_837803_837801_837800_837799_837798_837796 TTGTCTCTTTTGATCTTTGTTGGTTTAAAGTCTGTTTTATCAGAGACTAGGATTGCAACCCCT TTGTCTCTTTTGATCTTTGTTGGTTTAAAGTCTGTTTTATCAGAGACTGGGATTGCAACCCCT 3.50629e-12 . DP=10;XSEE=19303:0,19305:0,19307:0,19308:0,19310:0,19312:0,19313:0,19316:0,19324:0,19329:0,19421:0,321501:0,321502:0,321504:0,321505:0,321529:0,337671:0,357757:0,691962:0,691963:0,691963:18;XREF GT:DP:AD:SB:XAAD:AL 1/1:10:0,10:0,0,0,10:10:-1e+100,-1.49947 1 . 30052145 837884_837817_837813_1060355_837806_837804_837803_837802_837800_837799_837798_837796 TTGTCTCTTTTGATCTTTGTTGGTTTAAAGTCTGTTTTATCAGAGACTAGGATTGCAACCCCT . -0 . DP=10;XSEE=19303:0,19305:0,19307:0,19308:0,19311:0,19312:0,19313:0,19316:0,19324:0,19329:0,19421:0,321501:0,321502:0,321504:0,321505:0,321529:0,337671:0,357757:0,691962:0,691963:0,691963:18;XREF GT:DP:AD:SB:XAAD:AL 0/0:10:0,10:0,0,0,10:10:-1e+100,-1.49947 6 . 30052145 837884_837817_837813_837811_1048822_1048819 TTGTCTCTTTTGATCTTTGTTGGTTTAAAGTCTGTTTTA . -0 . DP=11;XSEE=19322:0,19324:0,19329:0,19421:0,321509:0,321514:0,321529:0,691962:0,691963:0,691963:18;XREF GT:DP:AD:SB:XAAD:AL 0/0:11:0,11:0,0,0,11:11:-1e+100,-0.649482 1 . 30052145 837884_837817_837813_837811_837810_1048847_1048846_1048845_837801_837800_837799_837798_837796 TTGTCTCTTTTGATCTTTGTTGGTTTAAAGTCTGTTTTATCAGAGACTAGGATTGCAACCCCT TTGTCTCTTTTGATCTTTGTTGGTTTAAAGTCTGTTTTATCAGAGACTGGGATTGCAACCCCT -0 . DP=2;XSEE=19303:0,19305:0,19307:0,19308:0,19310:0,19321:0,19322:0,19324:0,19329:0,19421:0,321501:0,321502:0,321504:0,321505:0,321529:0,321540:0,321541:0,321542:0,357757:0,691962:0,691963:0,691963:18;XREF GT:DP:AD:SB:XAAD:AL ./.:2:0,2:0,0,0,2:2:-1e+100,-1.49947 3 . 30052145 837884_837817_837813_837811_837810_1048847_1048846_837908_837803_837801_837800_837799_837798_837796 TTGTCTCTTTTGATCTTTGTTGGTTTAAAGTCTGTTTTATCAGAGACTAGGATTGCAACCCCT TTGTCTCTTTTGATCTTTGTTGGTTTAAAGTCTGTTTTATCAGAGACTGGGATTGCAACCCCT -0 . DP=0;XSEE=19303:0,19305:0,19307:0,19308:0,19310:0,19312:0,19321:0,19322:0,19324:0,19329:0,19421:0,19452:0,321501:0,321502:0,321504:0,321505:0,321529:0,321541:0,321542:0,357757:0,691962:0,691963:0,691963:18;XREF GT:DP:AD:SB:XAAD:AL ./.:0:0,0:0,0,0,0:0:-1e+100,-1.49947 2 . 30052145 837884_837817_837813_837811_837810_1048847_1048846_837908_837907_837905_837904_837799_837798_837796 TTGTCTCTTTTGATCTTTGTTGGTTTAAAGTCTGTTTTATCAGAGACTAGGATTGCAACCCCT TTGTCTCTTTTGATCTTTGTTGGTTTAAAGTCTGTTTTATCAGAGAGTAGGATTGCAACCCCT -0 . DP=8;XSEE=19303:0,19305:0,19307:0,19321:0,19322:0,19324:0,19329:0,19421:0,19448:0,19449:0,19451:0,19452:0,321501:0,321502:0,321504:0,321505:0,321529:0,321541:0,321542:0,357757:0,691962:0,691963:0,691963:18;XREF GT:DP:AD:SB:XAAD:AL ./.:8:0,8:0,0,0,8:8:-1e+100,-1.49947 1 . 30052145 837884_837817_837813_837811_837810_1048847_1048846_837908_837907_837906_837904_837799_837798_837796 TTGTCTCTTTTGATCTTTGTTGGTTTAAAGTCTGTTTTATCAGAGACTAGGATTGCAACCCCT . -0 . DP=5;XSEE=19303:0,19305:0,19307:0,19321:0,19322:0,19324:0,19329:0,19421:0,19448:0,19450:0,19451:0,19452:0,321501:0,321502:0,321504:0,321505:0,321529:0,321541:0,321542:0,357757:0,691962:0,691963:0,691963:18;XREF GT:DP:AD:SB:XAAD:AL ./.:5:0,5:0,0,0,5:5:-1e+100,-1.49947 1 . 30052145 837884_837817_837813_837811_837810_837804_837803_837801_837800_837799_837798_837796 TTGTCTCTTTTGATCTTTGTTGGTTTAAAGTCTGTTTTATCAGAGACTAGGATTGCAACCCCT TTGTCTCTTTTGATCTTTGTTGGTTTAAAGTCTGTTTTATCAGAGACTGGGATTGCAACCCCT 1.52995e-11 . DP=11;XSEE=19303:0,19305:0,19307:0,19308:0,19310:0,19312:0,19313:0,19321:0,19322:0,19324:0,19329:0,19421:0,321501:0,321502:0,321504:0,321505:0,321529:0,357757:0,691962:0,691963:0,691963:18;XREF GT:DP:AD:SB:XAAD:AL 1/1:11:0,11:0,0,0,11:11:-1e+100,-1.49947 1 . 30052145 837884_837817_837816_837811_1048822_1048819 TTGTCTCTTTTGATCTTTGTTGGTTTAAAGTCTGTTTTA . -0 . DP=27;XSEE=19322:0,19328:0,19329:0,19421:0,321509:0,321514:0,321529:0,691962:0,691963:0,691963:18;XREF GT:DP:AD:SB:XAAD:AL 0/0:27:0,27:0,0,0,27:27:-1e+100,-1.60631
First, what are the IDs (837884_837817_837813_837811_1048822_1048819)? I initially thought they were node IDs, but they don't seem to exist when I try vg find -n 837884 -c 2 -x graph.xg | vg view - (I get an Assertionidx < this->size()' failed.` error, which I've gotten before when searching for a non-existent node https://github.com/vgteam/vg/issues/364).
Second, should the same position, ID, ref and alt be listed more than once? I haven't yet managed to get decomposing block subs or normalization through vt to work yet, so I'm not sure if this would clear up some of the duplications.
(Relatedly, the readme says vt is found at http://varianttools. sourceforge.net/Association/HomePage, but I'm assuming this should be http://genome.sph.umich.edu/wiki/Vt ?)
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/469, or mute the thread https://github.com/notifications/unsubscribe-auth/AA2_7qmv7RlYZdBLjs4nuFkW5j5Vp_2dks5qnyakgaJpZM4J3Wq2 .
Hi Glenn,
Thanks for the clarification on the IDs--I see them now.
I was using 1.4.0-1130-gc77e33c. I updated to 1.4.0-1269-g2095926, but now cannot get call
to complete:
Node 1237919 is on ref path but not traced in site 1237918 fwd to 1237921 fwd
terminate called after throwing an instance of 'std::runtime_error'
what(): Extra ref node found
Node 1237919 can be found in the alternate graph but not the original, and I'm left with a truncated vcf (2373 entries compared to 44411 when I use v.1.4.0-1130). What's happening here? (I can share files if needed, but they're too big to attach here.)
This may be another known issue (this time in the mapper). But the fix is in a PR that hasn't made it to master yet https://github.com/vgteam/vg/pull/476
@adamnovak do you think that's what's going on?
We (UCSC) are using this frozen version of vg, which I should have shared in my last message
https://github.com/adamnovak/vg/tree/experiment-freeze
for the time being to avoid these moving parts-related issues as we finish up results for a paper. If you want to somehow share your data and command lines, I'd be happy to try it out with this version.
On Wed, Sep 14, 2016 at 4:34 PM, Heather Gibling notifications@github.com wrote:
Hi Glenn,
Thanks for the clarification on the IDs--I see them now.
I was using 1.4.0-1130-gc77e33c. I updated to 1.4.0-1269-g2095926, but now cannot get call to complete:
Node 1237919 is on ref path but not traced in site 1237918 fwd to 1237921 fwd terminate called after throwing an instance of 'std::runtime_error' what(): Extra ref node found
Node 1237919 can be found in the alternate graph but not the original, and I'm left with a truncated vcf (2373 entries compared to 44411 when I use v.1.4.0-1130). What's happening here? (I can share files if needed, but they're too big to attach here.)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/469#issuecomment-247144685, or mute the thread https://github.com/notifications/unsubscribe-auth/AA2_7u6EyhQeh9nayFKZHHVR7Ia4Qbirks5qqFpbgaJpZM4J3Wq2 .
I don't know why it's happening, and I don't know if #476 (which I just merged in) will fix it, but it sounds like you have a site where the reference path is somehow hitting it twice? Or where maybe part of the reference is being skipped when the reference path is traced out?
Can you pull out the augmented graph with the -A aug.vg
option, index it with vg index -x aug.xg aug.vg
, and then plot the region around the problem node with vg find -n 1237919 -c 10 -x aug.xg | vg view -dn - | dot -Tsvg -o problem.svg
, and then show us the plot? It ought to show the weird extra reference node, the 10-hop region around it, and where the reference path falls in that region, and ought to help make the problem clearer.
1237919 is there and looks like the ref hits it once (ref is 🗟):
(with paths:) problem-paths.svg.gz
Thanks. The icons don't work for me but as far as I can see from that image, path "ref" goes through 1237920 and not 1237919. So it looks like something's up with this path (either branching out or cycling). Maybe vg validate (on both your original and augmented graph) will catch it. Still happy to try reproducing if you can somehow share the input to vg call.
On Thu, Sep 15, 2016 at 5:26 PM, Heather Gibling notifications@github.com wrote:
1237919 is there and looks like the ref hits it once (ref is 🗟):
problem.svg.gz https://github.com/vgteam/vg/files/475618/problem.svg.gz
(with paths:) problem-paths.svg.gz https://github.com/vgteam/vg/files/475623/problem-paths.svg.gz
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/469#issuecomment-247460288, or mute the thread https://github.com/notifications/unsubscribe-auth/AA2_7kcYNHhg0nGgxRcvq8MEWnv-ldchks5qqbghgaJpZM4J3Wq2 .
Whoops, here's a partial screenshot. ref is the... circled classified ad on a newspaper? It looks like it hits both 1237920 and 1237919.
Both the original and augmented graphs validated fine.
That would be great if you could take a look. I've emailed you a link to a tarball with the pileup and original graph, original graph index, and the augmented graph I got from vg call -A 1269-aug.vg -o 28477796 MHC-ref-haps-n.vg 1269.pileup > 1269.vcf
Thanks! I got the file and can reproduce your error. Will take a look...
On Fri, Sep 16, 2016 at 4:10 PM, Heather Gibling notifications@github.com wrote:
Whoops, here's a partial screenshot. ref is the... circled classified ad on a newspaper? It looks like it hits both 1237920 and 1237919.
[image: screenshot from 2016-09-16 11-12-57] https://cloud.githubusercontent.com/assets/7570351/18599163/ea91ccba-7c24-11e6-82e8-2a4bb8fc48e6.png
Both the original and augmented graphs validated fine. That would be great if you could take a look. I've emailed you a link to a tarball with the pileup and original graph, original graph index, and the augmented graph I got from vg call -A 1269-aug.vg -o 28477796 MHC-ref-haps-n.vg 1269.pileup > 1269.vcf
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/469#issuecomment-247696556, or mute the thread https://github.com/notifications/unsubscribe-auth/AA2_7q04g-rWwiYt0276gZx9dwYPK8NRks5qqvewgaJpZM4J3Wq2 .
Both nodes are indeed in the ref path, which seems to take a big loop through that area twice
vg view 1269-aug.vg -j | jq .path | jq 'map(select(.name == "ref"))' | grep node | grep -A 3 -B 3 -n '1237919|1237920'
717632- "node_id": 1237916 717633- "node_id": 1237917 717634- "node_id": 1237918 _717635: "nodeid": 1237920 717636- "node_id": 1237921 717637- "node_id": 1237923
726903- "node_id": 1237916 726904- "node_id": 1237917 726905- "node_id": 1237918 _726906: "nodeid": 1237919 726907- "node_id": 1237921 726908- "node_id": 1237923 726909- "node_id": 1237924
On Fri, Sep 16, 2016 at 4:30 PM, Glenn Hickey glenn.hickey@gmail.com wrote:
Thanks! I got the file and can reproduce your error. Will take a look...
On Fri, Sep 16, 2016 at 4:10 PM, Heather Gibling <notifications@github.com
wrote:
Whoops, here's a partial screenshot. ref is the... circled classified ad on a newspaper? It looks like it hits both 1237920 and 1237919.
[image: screenshot from 2016-09-16 11-12-57] https://cloud.githubusercontent.com/assets/7570351/18599163/ea91ccba-7c24-11e6-82e8-2a4bb8fc48e6.png
Both the original and augmented graphs validated fine. That would be great if you could take a look. I've emailed you a link to a tarball with the pileup and original graph, original graph index, and the augmented graph I got from vg call -A 1269-aug.vg -o 28477796 MHC-ref-haps-n.vg 1269.pileup > 1269.vcf
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/469#issuecomment-247696556, or mute the thread https://github.com/notifications/unsubscribe-auth/AA2_7q04g-rWwiYt0276gZx9dwYPK8NRks5qqvewgaJpZM4J3Wq2 .
Will set a few more tests going but probably won't be able to say much before monday...
On Fri, Sep 16, 2016 at 4:48 PM, Glenn Hickey glenn.hickey@gmail.com wrote:
Both nodes are indeed in the ref path, which seems to take a big loop through that area twice
vg view 1269-aug.vg -j | jq .path | jq 'map(select(.name == "ref"))' | grep node | grep -A 3 -B 3 -n '1237919|1237920'
717632- "node_id": 1237916 717633- "node_id": 1237917 717634- "node_id": 1237918 _717635: "nodeid": 1237920 717636- "node_id": 1237921 717637- "node_id": 1237923
717638- "node_id": 349035
726903- "node_id": 1237916 726904- "node_id": 1237917 726905- "node_id": 1237918 _726906: "nodeid": 1237919 726907- "node_id": 1237921 726908- "node_id": 1237923 726909- "node_id": 1237924
On Fri, Sep 16, 2016 at 4:30 PM, Glenn Hickey glenn.hickey@gmail.com wrote:
Thanks! I got the file and can reproduce your error. Will take a look...
On Fri, Sep 16, 2016 at 4:10 PM, Heather Gibling < notifications@github.com> wrote:
Whoops, here's a partial screenshot. ref is the... circled classified ad on a newspaper? It looks like it hits both 1237920 and 1237919.
[image: screenshot from 2016-09-16 11-12-57] https://cloud.githubusercontent.com/assets/7570351/18599163/ea91ccba-7c24-11e6-82e8-2a4bb8fc48e6.png
Both the original and augmented graphs validated fine. That would be great if you could take a look. I've emailed you a link to a tarball with the pileup and original graph, original graph index, and the augmented graph I got from vg call -A 1269-aug.vg -o 28477796 MHC-ref-haps-n.vg 1269.pileup > 1269.vcf
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/469#issuecomment-247696556, or mute the thread https://github.com/notifications/unsubscribe-auth/AA2_7q04g-rWwiYt0276gZx9dwYPK8NRks5qqvewgaJpZM4J3Wq2 .
No worries, thanks for taking the time to look!
Just taking a look at this again, and pretty sure it's the cycles that are throwing off vg call. Specifically, vg call operates under the assumption that the reference path is acyclic. So any graph made with vg construct should work. Mind if I ask how you made your graph?
Adding support for cycles here shouldn't be too big of an undertaking, but it will have to wait until we get a few other things fixed up.
On Fri, Sep 16, 2016 at 4:54 PM, Heather Gibling notifications@github.com wrote:
No worries, thanks for taking the time to look!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/469#issuecomment-247706764, or mute the thread https://github.com/notifications/unsubscribe-auth/AA2_7gEy0HCoakUrO18F1iFyzjXhK9wTks5qqwIegaJpZM4J3Wq2 .
vg mod can unroll the graph so it becomes acyclic with sequences in cycles preserved up to some length (k).
On Mon, Sep 19, 2016, 21:46 Glenn Hickey notifications@github.com wrote:
Just taking a look at this again, and pretty sure it's the cycles that are throwing off vg call. Specifically, vg call operates under the assumption that the reference path is acyclic. So any graph made with vg construct should work. Mind if I ask how you made your graph?
Adding support for cycles here shouldn't be too big of an undertaking, but it will have to wait until we get a few other things fixed up.
On Fri, Sep 16, 2016 at 4:54 PM, Heather Gibling <notifications@github.com
wrote:
No worries, thanks for taking the time to look!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/469#issuecomment-247706764, or mute the thread < https://github.com/notifications/unsubscribe-auth/AA2_7gEy0HCoakUrO18F1iFyzjXhK9wTks5qqwIegaJpZM4J3Wq2
.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/469#issuecomment-248101901, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI4EXzQOMCjDO8IGPrYpFK5rKBKQBmPks5qruZ9gaJpZM4J3Wq2 .
Does it unspool embedded paths in a reasonable way?
On Mon, Sep 19, 2016 at 2:00 PM, Erik Garrison notifications@github.com wrote:
vg mod can unroll the graph so it becomes acyclic with sequences in cycles preserved up to some length (k).
On Mon, Sep 19, 2016, 21:46 Glenn Hickey notifications@github.com wrote:
Just taking a look at this again, and pretty sure it's the cycles that are throwing off vg call. Specifically, vg call operates under the assumption that the reference path is acyclic. So any graph made with vg construct should work. Mind if I ask how you made your graph?
Adding support for cycles here shouldn't be too big of an undertaking, but it will have to wait until we get a few other things fixed up.
On Fri, Sep 16, 2016 at 4:54 PM, Heather Gibling < notifications@github.com
wrote:
No worries, thanks for taking the time to look!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/469#issuecomment-247706764, or mute the thread < https://github.com/notifications/unsubscribe-auth/AA2_ 7gEy0HCoakUrO18F1iFyzjXhK9wTks5qqwIegaJpZM4J3Wq2
.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/469#issuecomment-248101901, or mute the thread https://github.com/notifications/unsubscribe-auth/ AAI4EXzQOMCjDO8IGPrYpFK5rKBKQBmPks5qruZ9gaJpZM4J3Wq2
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/469#issuecomment-248124971, or mute the thread https://github.com/notifications/unsubscribe-auth/AE0_Xxl9jyZ1bASJ9bBy6O0vdmJ1CudPks5qrvfUgaJpZM4J3Wq2 .
It could if we forced it to unroll all loops in any paths.
Right now we would need to realign against the graph unrolled to the maximum cycle length in any of the paths. This is tricky.
On Mon, Sep 19, 2016, 23:02 adamnovak notifications@github.com wrote:
Does it unspool embedded paths in a reasonable way?
On Mon, Sep 19, 2016 at 2:00 PM, Erik Garrison notifications@github.com wrote:
vg mod can unroll the graph so it becomes acyclic with sequences in cycles preserved up to some length (k).
On Mon, Sep 19, 2016, 21:46 Glenn Hickey notifications@github.com wrote:
Just taking a look at this again, and pretty sure it's the cycles that are throwing off vg call. Specifically, vg call operates under the assumption that the reference path is acyclic. So any graph made with vg construct should work. Mind if I ask how you made your graph?
Adding support for cycles here shouldn't be too big of an undertaking, but it will have to wait until we get a few other things fixed up.
On Fri, Sep 16, 2016 at 4:54 PM, Heather Gibling < notifications@github.com
wrote:
No worries, thanks for taking the time to look!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/469#issuecomment-247706764, or mute the thread < https://github.com/notifications/unsubscribe-auth/AA2_ 7gEy0HCoakUrO18F1iFyzjXhK9wTks5qqwIegaJpZM4J3Wq2
.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/469#issuecomment-248101901, or mute the thread https://github.com/notifications/unsubscribe-auth/ AAI4EXzQOMCjDO8IGPrYpFK5rKBKQBmPks5qruZ9gaJpZM4J3Wq2
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/469#issuecomment-248124971, or mute the thread < https://github.com/notifications/unsubscribe-auth/AE0_Xxl9jyZ1bASJ9bBy6O0vdmJ1CudPks5qrvfUgaJpZM4J3Wq2
.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/469#issuecomment-248125542, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI4EQDwpt9dVjFPVcwSh79MeOum7JI4ks5qrvhagaJpZM4J3Wq2 .
Code will have to be written to get the path maintenance through the unroll right.
On Mon, Sep 19, 2016, 23:21 Erik Garrison erik.garrison@gmail.com wrote:
It could if we forced it to unroll all loops in any paths.
Right now we would need to realign against the graph unrolled to the maximum cycle length in any of the paths. This is tricky.
On Mon, Sep 19, 2016, 23:02 adamnovak notifications@github.com wrote:
Does it unspool embedded paths in a reasonable way?
On Mon, Sep 19, 2016 at 2:00 PM, Erik Garrison notifications@github.com wrote:
vg mod can unroll the graph so it becomes acyclic with sequences in cycles preserved up to some length (k).
On Mon, Sep 19, 2016, 21:46 Glenn Hickey notifications@github.com wrote:
Just taking a look at this again, and pretty sure it's the cycles that are throwing off vg call. Specifically, vg call operates under the assumption that the reference path is acyclic. So any graph made with vg construct should work. Mind if I ask how you made your graph?
Adding support for cycles here shouldn't be too big of an undertaking, but it will have to wait until we get a few other things fixed up.
On Fri, Sep 16, 2016 at 4:54 PM, Heather Gibling < notifications@github.com
wrote:
No worries, thanks for taking the time to look!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/469#issuecomment-247706764, or mute the thread < https://github.com/notifications/unsubscribe-auth/AA2_ 7gEy0HCoakUrO18F1iFyzjXhK9wTks5qqwIegaJpZM4J3Wq2
.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/469#issuecomment-248101901, or mute the thread https://github.com/notifications/unsubscribe-auth/ AAI4EXzQOMCjDO8IGPrYpFK5rKBKQBmPks5qruZ9gaJpZM4J3Wq2
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/469#issuecomment-248124971, or mute the thread < https://github.com/notifications/unsubscribe-auth/AE0_Xxl9jyZ1bASJ9bBy6O0vdmJ1CudPks5qrvfUgaJpZM4J3Wq2
.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/469#issuecomment-248125542, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI4EQDwpt9dVjFPVcwSh79MeOum7JI4ks5qrvhagaJpZM4J3Wq2 .
So any graph made with vg construct should work. Mind if I ask how you made your graph?
With construct and msga:
vg construct -R 6:28510120-33480577 -t 16 -m 32 -v SNPs.vcf -r ref-seq.fa > MHC-ref.vg
vg mod -D MHC-ref.vg > MHC-ref-nopath.vg
vg msga -g MHC-ref-nopath.vg -f MHC-haps.fa -B 128 -K 11 -X 2 -E 3 -G -S 0.95 -H 5 -z -D -t 16 > MHC-ref-haps.vg
vg mod -n MHC-ref-haps.vg > MHC-ref-haps-n.vg
vg call operates under the assumption that the reference path is acyclic.
This is good to know. (I built a cyclic graph for another project, but I don't think I'll be calling--just mapping.)
You could try to control the amount of cycling using changes to the way that msga runs. If you are on a new version msga will use the "MEM threader" which clusters then walks through the MEMs using the positional index provided by the paths to make clusters. This can generally tolerate larger bandwidth -B setting. Setting -B higher will decrease the likelihood of cyclic structures in the graph, up to the limit of -B being the length of the input sequences. At that point the output is equivalent to an old fashioned MSA, and will be a DAG.
It's good to know that you have been able to use a cyclic graph for mapping. Variant calling on them should be possible but we have this problem about the unroll not respecting the paths.
On Tue, Sep 20, 2016, 05:36 Heather Gibling notifications@github.com wrote:
So any graph made with vg construct should work. Mind if I ask how you made your graph?
With construct and msga:
vg construct -R 6:28510120-33480577 -t 16 -m 32 -v SNPs.vcf -r ref-seq.fa > MHC-ref.vg vg mod -D MHC-ref.vg > MHC-ref-nopath.vg vg msga -g MHC-ref-nopath.vg -f MHC-haps.fa -B 128 -K 11 -X 2 -E 3 -G -S 0.95 -H 5 -z -D -t 16 > MHC-ref-haps.vg vg mod -n MHC-ref-haps.vg > MHC-ref-haps-n.vg
vg call operates under the assumption that the reference path is acyclic.
This is good to know. (I built a cyclic graph for another project, but I don't think I'll be calling--just mapping.)
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/469#issuecomment-248192771, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI4EaqIb1FIjwKVgMoHw-qfhE6AuJzVks5qr1SdgaJpZM4J3Wq2 .
Without being too familiar with vg msga, is removing the vg mod -D step an option?
That way you'd keep the reference path from ref-seq.fa built in vg construct which, presumably, you want any output VCF based on anyway. Then so long as that path isn't folded up by vg msga, you should be good to go. I imagine this would entail removing the reference path from MHC-haps.fa as well.
On Tue, Sep 20, 2016 at 1:37 AM, Erik Garrison notifications@github.com wrote:
You could try to control the amount of cycling using changes to the way that msga runs. If you are on a new version msga will use the "MEM threader" which clusters then walks through the MEMs using the positional index provided by the paths to make clusters. This can generally tolerate larger bandwidth -B setting. Setting -B higher will decrease the likelihood of cyclic structures in the graph, up to the limit of -B being the length of the input sequences. At that point the output is equivalent to an old fashioned MSA, and will be a DAG.
It's good to know that you have been able to use a cyclic graph for mapping. Variant calling on them should be possible but we have this problem about the unroll not respecting the paths.
On Tue, Sep 20, 2016, 05:36 Heather Gibling notifications@github.com wrote:
So any graph made with vg construct should work. Mind if I ask how you made your graph?
With construct and msga:
vg construct -R 6:28510120-33480577 -t 16 -m 32 -v SNPs.vcf -r ref-seq.fa > MHC-ref.vg vg mod -D MHC-ref.vg > MHC-ref-nopath.vg vg msga -g MHC-ref-nopath.vg -f MHC-haps.fa -B 128 -K 11 -X 2 -E 3 -G -S 0.95 -H 5 -z -D -t 16 > MHC-ref-haps.vg vg mod -n MHC-ref-haps.vg > MHC-ref-haps-n.vg
vg call operates under the assumption that the reference path is acyclic.
This is good to know. (I built a cyclic graph for another project, but I don't think I'll be calling--just mapping.)
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/469#issuecomment-248192771, or mute the thread https://github.com/notifications/unsubscribe- auth/AAI4EaqIb1FIjwKVgMoHw-qfhE6AuJzVks5qr1SdgaJpZM4J3Wq2 .
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/469#issuecomment-248206289, or mute the thread https://github.com/notifications/unsubscribe-auth/AA2_7uobSB_k7wSbYIprT1T3_ST1O6BFks5qr3EdgaJpZM4J3Wq2 .
Here's a suggested process that should remove the cycles and make a more-linear graph:
vg construct -R 6:28510120-33480577 -t 16 -m 32 -v SNPs.vcf -r ref-seq.fa > MHC-ref.vg
vg mod -D MHC-ref.vg > MHC-ref-nopath.vg
vg msga -g MHC-ref-nopath.vg -f MHC-haps.fa -B 10000 -K 11 -X 2 -D -t 16 > MHC-ref-haps.vg
vg mod -b MHC-ref-haps.vg | vg mod -U - > MHC-ref-haps-n.vg
Changes. You could try one or both of these:
There are a few changes to the mapper mode used in MSGA ("mem-threading") that are about to land. Bugfix stuff, so test this today with caution. I'll ping back when those are in master.
@hgibling thanks for all the feedback and help. Let's get this to work.
Sorry for the delay (had to focus on a side project for a bit).
I'm trying Erik's suggested process with newer versions of vg, but cannot get the msga to work:
...
LCPArray::LCPArray(): Construction: 0.593201 seconds, 15.1799 GB
LCPArray::LCPArray(): 19442735 values at 6 levels (branching factor 64)
gi224183341|gb|GL000251.1|: adding to graph1
gi224183341|gb|GL000251.1|: aligning sequence of 4795370bp against 891887 nodes
no alignments returned from walk match with I_2TCTGGCCTGGGAGTCACAGCT 24 1+12
vg: src/mapper.cpp:2142: vg::Alignment vg::Mapper::walk_match(const string&, vg::pos_t): Assertion `false' failed.
^I get this using both v1.4.0-1414-g72c1c30 and v1.4.0-1318-gf273f81. Are there issues with the mem-threading? I also tried with -B 1000
but received the same error.
Using the original arguments I had to build the graph also fails:
LCPArray::LCPArray(): Construction: 0.475991 seconds, 6.84095 GB
LCPArray::LCPArray(): 17017201 values at 5 levels (branching factor 64)
gi224183341|gb|GL000251.1|: adding to graph1
gi224183341|gb|GL000251.1|: aligning sequence of 4795370bp against 891956 nodes
no alignments returned from walk match with I_2TCTGGCCTGGGAGTCACAG 22 1+12
vg: src/mapper.cpp:2142: vg::Alignment vg::Mapper::walk_match(const string&, vg::pos_t): Assertion `false' failed.
Ideas on what's happening?
There is a problem with the mem threading in master. I am trying to sync my branch into master today but meeting heavy resistance from git.
On Mon, Nov 14, 2016, 20:59 Heather Gibling notifications@github.com wrote:
Sorry for the delay (had to focus on a side project for a bit).
I'm trying Erik's suggested process with newer versions of vg, but cannot get the msga to work:
... LCPArray::LCPArray(): Construction: 0.593201 seconds, 15.1799 GB LCPArray::LCPArray(): 19442735 values at 6 levels (branching factor 64) gi224183341|gb|GL000251.1|: adding to graph1 gi224183341|gb|GL000251.1|: aligning sequence of 4795370bp against 891887 nodes no alignments returned from walk match with I_2TCTGGCCTGGGAGTCACAGCT 24 1+12 vg: src/mapper.cpp:2142: vg::Alignment vg::Mapper::walk_match(const string&, vg::pos_t): Assertion `false' failed.
^I get this using both v1.4.0-1414-g72c1c30 and v1.4.0-1318-gf273f81. Are there issues with the mem-threading? I also tried with -B 1000 but received the same error.
Using the original arguments I had to build the graph also fails:
LCPArray::LCPArray(): Construction: 0.475991 seconds, 6.84095 GB LCPArray::LCPArray(): 17017201 values at 5 levels (branching factor 64) gi224183341|gb|GL000251.1|: adding to graph1 gi224183341|gb|GL000251.1|: aligning sequence of 4795370bp against 891956 nodes no alignments returned from walk match with I_2TCTGGCCTGGGAGTCACAG 22 1+12 vg: src/mapper.cpp:2142: vg::Alignment vg::Mapper::walk_match(const string&, vg::pos_t): Assertion `false' failed.
Ideas on what's happening?
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/469#issuecomment-260445100, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI4ETvsiitMaKLjPKfecQt9Ku551Ibyks5q-L2fgaJpZM4J3Wq2 .
@ekg has merged #493 and so I think the MEM threader is in now.
It is not set by default yet. We are still working on some updates to gssw and the alignment pinning that are required to exceed the performance of the default mode even with high error rates.
On Fri, Nov 25, 2016, 19:21 Adam Novak notifications@github.com wrote:
@ekg https://github.com/ekg has merged #493 https://github.com/vgteam/vg/pull/493 and so I think the MEM threader is in now.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/469#issuecomment-263007746, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI4Eecrnevk7LEmMuTNS9KOzEIFKKzhks5rBydGgaJpZM4J3Wq2 .
It's been a while, but I'm trying this again with v1.4.0-2212:
vg construct -R 6:28477797-33448354 -t 8 -m 32 -v chr6.vcf.gz -r ref-seq.fa -f > MHC-ref.vg
vg mod -D MHC-ref.vg > MHC-ref-nopath.vg
vg msga -g MHC-ref-nopath.vg -f MHC-haps.fa -B 1000 -K 11 -X 2 -D -t 8 > MHC-ref-haps.vg
I tried with -B 10000
but that was using more memory than our cluster could handle.
Not sure what to make of the error:
preparing initial graph
building xg index
building GCSA2 index
InputGraph::InputGraph(): 15565700 kmers in 1 file(s)
InputGraph::read(): Read 15565700 11-mers from .vg-kmers-tmp-XXcBpgjR
InputGraph::readKeys(): 3082364 unique keys
InputGraph::read(): Read 15565700 11-mers from .vg-kmers-tmp-XXcBpgjR
InputGraph::readFrom(): 10294918 unique from nodes
InputGraph::read(): Read 15565700 11-mers from .vg-kmers-tmp-XXcBpgjR
PathGraph::PathGraph(): 15565700 paths with 31131400 ranks
PathGraph::PathGraph(): 0.463894 GB in 1 file(s)
GCSA::GCSA(): Preprocessing: 10.4081 seconds, 1.93951 GB
GCSA::GCSA(): Prefix-doubling from path length 11
GCSA::GCSA(): Step 1 (path length 11 -> 22)
PathGraph::prune(): 15565700 -> 15511623 paths (3076855 ranges)
PathGraph::prune(): 794024 unique, 0 redundant, 14717599 unsorted, 0 nondeterministic paths
PathGraph::prune(): 0.462282 GB in 1 file(s)
PathGraph::read(): File 0: Read 15511623 order-11 paths
PathGraph::extend(): File 0: Created 28208516 order-22 paths
PathGraph::read(): File 0: Read 28208516 order-22 paths
PathGraphBuilder::sort(): File 0: Sorted 28208516 paths
PathGraph::extend(): 15511623 -> 28208516 paths (83831524 ranks)
PathGraph::extend(): 0.942806 GB in 1 file(s)
GCSA::GCSA(): Step 2 (path length 22 -> 44)
PathGraph::prune(): 28208516 -> 17759252 paths (15577856 ranges)
PathGraph::prune(): 14756138 unique, 0 redundant, 2992751 unsorted, 10363 nondeterministic paths
PathGraph::prune(): 0.592467 GB in 1 file(s)
PathGraph::read(): File 0: Read 17759252 order-22 paths
PathGraph::extend(): File 0: Created 22817358 order-44 paths
PathGraph::read(): File 0: Read 22817358 order-44 paths
PathGraphBuilder::sort(): File 0: Sorted 22817358 paths
PathGraph::extend(): 17759252 -> 22817358 paths (83540499 ranks)
PathGraph::extend(): 0.82122 GB in 1 file(s)
GCSA::GCSA(): Prefix-doubling: 47.4414 seconds, 2.52268 GB
GCSA::GCSA(): Merging the paths
MergedGraph::MergedGraph(): 18761721 paths with 62861414 ranks and 837433 additional from nodes
MergedGraph::MergedGraph(): 0.683486 GB
GCSA::GCSA(): Merging: 11.5354 seconds, 2.52268 GB
GCSA::GCSA(): Building the index
GCSA::GCSA(): Construction: 26.3434 seconds, 2.52268 GB
GCSA::GCSA(): 18761721 paths, 19802970 edges
GCSA::GCSA(): 19599154 pointers (9304236 redundant)
GCSA::GCSA(): 3187639 samples at 2790108 positions
LCPArray::LCPArray(): Construction: 0.673589 seconds, 2.52268 GB
LCPArray::LCPArray(): 19059529 values at 6 levels (branching factor 64)
ref: adding to graph1
ref: aligning sequence of 4970557bp against 573837 nodes
ref: editing graph
ref: sorting and compacting ids
building xg index
building GCSA2 index
InputGraph::InputGraph(): 15573032 kmers in 1 file(s)
InputGraph::read(): Read 15573032 11-mers from .vg-kmers-tmp-XX0gfqTC
InputGraph::readKeys(): 3082757 unique keys
InputGraph::read(): Read 15573032 11-mers from .vg-kmers-tmp-XX0gfqTC
InputGraph::readFrom(): 10295018 unique from nodes
InputGraph::read(): Read 15573032 11-mers from .vg-kmers-tmp-XX0gfqTC
PathGraph::PathGraph(): 15573032 paths with 31146064 ranks
PathGraph::PathGraph(): 0.464113 GB in 1 file(s)
GCSA::GCSA(): Preprocessing: 10.0734 seconds, 4.05677 GB
GCSA::GCSA(): Prefix-doubling from path length 11
GCSA::GCSA(): Step 1 (path length 11 -> 22)
PathGraph::prune(): 15573032 -> 15518875 paths (3077248 ranges)
PathGraph::prune(): 793893 unique, 0 redundant, 14724982 unsorted, 0 nondeterministic paths
PathGraph::prune(): 0.462499 GB in 1 file(s)
PathGraph::read(): File 0: Read 15518875 order-11 paths
PathGraph::extend(): File 0: Created 28236257 order-22 paths
PathGraph::read(): File 0: Read 28236257 order-22 paths
PathGraphBuilder::sort(): File 0: Sorted 28236257 paths
PathGraph::extend(): 15518875 -> 28236257 paths (83914878 ranks)
PathGraph::extend(): 0.943737 GB in 1 file(s)
GCSA::GCSA(): Step 2 (path length 22 -> 44)
PathGraph::prune(): 28236257 -> 17773107 paths (15585140 ranges)
PathGraph::prune(): 14761744 unique, 0 redundant, 3000924 unsorted, 10439 nondeterministic paths
PathGraph::prune(): 0.592932 GB in 1 file(s)
PathGraph::read(): File 0: Read 17773107 order-22 paths
PathGraph::extend(): File 0: Created 22877942 order-44 paths
PathGraph::read(): File 0: Read 22877942 order-44 paths
PathGraphBuilder::sort(): File 0: Sorted 22877942 paths
PathGraph::extend(): 17773107 -> 22877942 paths (83831103 ranks)
PathGraph::extend(): 0.823657 GB in 1 file(s)
GCSA::GCSA(): Prefix-doubling: 46.9834 seconds, 4.09481 GB
GCSA::GCSA(): Merging the paths
MergedGraph::MergedGraph(): 18772860 paths with 62904019 ranks and 843401 additional from nodes
MergedGraph::MergedGraph(): 0.683993 GB
GCSA::GCSA(): Merging: 9.54995 seconds, 4.09481 GB
GCSA::GCSA(): Building the index
GCSA::GCSA(): Construction: 21.8281 seconds, 4.09481 GB
GCSA::GCSA(): 18772860 paths, 19816094 edges
GCSA::GCSA(): 19616261 pointers (9321243 redundant)
GCSA::GCSA(): 3594587 samples at 3165728 positions
LCPArray::LCPArray(): Construction: 0.677416 seconds, 4.09481 GB
LCPArray::LCPArray(): 19070845 values at 6 levels (branching factor 64)
gi224183346|gb|GL000256.1|: adding to graph1
gi224183346|gb|GL000256.1|: aligning sequence of 4928568bp against 705332 nodes
terminate called after throwing an instance of 'std::runtime_error'
what(): Cannot simplify Mapping with no position; must update position when removing leading deletions
./msga-MHC-ref-nopath.sh: line 8: 27895 Aborted (core dumped)
Does your build include < https://github.com/vgteam/vg/commit/d0a0ceac6244551b512f28f63b3b9a11edfd4ae6>? That commit ought to have fixed this issue, which we used to get when chunks of long reads don't align to anything.
On Wed, Mar 29, 2017 at 10:59 AM, Heather Gibling notifications@github.com wrote:
It's been a while, but I'm trying this again with v1.4.0-2212:
vg construct -R 6:28477797-33448354 -t 8 -m 32 -v chr6.vcf.gz -r ref-seq.fa -f > MHC-ref.vg vg mod -D MHC-ref.vg > MHC-ref-nopath.vg vg msga -g MHC-ref-nopath.vg -f MHC-haps.fa -B 1000 -K 11 -X 2 -D -t 8 > MHC-ref-haps.vg
I tried with -B 10000 but that was using more memory than our cluster could handle. Not sure what to make of the error:
preparing initial graph building xg index building GCSA2 index InputGraph::InputGraph(): 15565700 kmers in 1 file(s) InputGraph::read(): Read 15565700 11-mers from .vg-kmers-tmp-XXcBpgjR InputGraph::readKeys(): 3082364 unique keys InputGraph::read(): Read 15565700 11-mers from .vg-kmers-tmp-XXcBpgjR InputGraph::readFrom(): 10294918 unique from nodes InputGraph::read(): Read 15565700 11-mers from .vg-kmers-tmp-XXcBpgjR PathGraph::PathGraph(): 15565700 paths with 31131400 ranks PathGraph::PathGraph(): 0.463894 GB in 1 file(s) GCSA::GCSA(): Preprocessing: 10.4081 seconds, 1.93951 GB GCSA::GCSA(): Prefix-doubling from path length 11 GCSA::GCSA(): Step 1 (path length 11 -> 22) PathGraph::prune(): 15565700 -> 15511623 paths (3076855 ranges) PathGraph::prune(): 794024 unique, 0 redundant, 14717599 unsorted, 0 nondeterministic paths PathGraph::prune(): 0.462282 GB in 1 file(s) PathGraph::read(): File 0: Read 15511623 order-11 paths PathGraph::extend(): File 0: Created 28208516 order-22 paths PathGraph::read(): File 0: Read 28208516 order-22 paths PathGraphBuilder::sort(): File 0: Sorted 28208516 paths PathGraph::extend(): 15511623 -> 28208516 paths (83831524 ranks) PathGraph::extend(): 0.942806 GB in 1 file(s) GCSA::GCSA(): Step 2 (path length 22 -> 44) PathGraph::prune(): 28208516 -> 17759252 paths (15577856 ranges) PathGraph::prune(): 14756138 unique, 0 redundant, 2992751 unsorted, 10363 nondeterministic paths PathGraph::prune(): 0.592467 GB in 1 file(s) PathGraph::read(): File 0: Read 17759252 order-22 paths PathGraph::extend(): File 0: Created 22817358 order-44 paths PathGraph::read(): File 0: Read 22817358 order-44 paths PathGraphBuilder::sort(): File 0: Sorted 22817358 paths PathGraph::extend(): 17759252 -> 22817358 paths (83540499 ranks) PathGraph::extend(): 0.82122 GB in 1 file(s) GCSA::GCSA(): Prefix-doubling: 47.4414 seconds, 2.52268 GB GCSA::GCSA(): Merging the paths MergedGraph::MergedGraph(): 18761721 paths with 62861414 ranks and 837433 additional from nodes MergedGraph::MergedGraph(): 0.683486 GB GCSA::GCSA(): Merging: 11.5354 seconds, 2.52268 GB GCSA::GCSA(): Building the index GCSA::GCSA(): Construction: 26.3434 seconds, 2.52268 GB GCSA::GCSA(): 18761721 paths, 19802970 edges GCSA::GCSA(): 19599154 pointers (9304236 redundant) GCSA::GCSA(): 3187639 samples at 2790108 positions LCPArray::LCPArray(): Construction: 0.673589 seconds, 2.52268 GB LCPArray::LCPArray(): 19059529 values at 6 levels (branching factor 64) ref: adding to graph1 ref: aligning sequence of 4970557bp against 573837 nodes ref: editing graph ref: sorting and compacting ids building xg index building GCSA2 index InputGraph::InputGraph(): 15573032 kmers in 1 file(s) InputGraph::read(): Read 15573032 11-mers from .vg-kmers-tmp-XX0gfqTC InputGraph::readKeys(): 3082757 unique keys InputGraph::read(): Read 15573032 11-mers from .vg-kmers-tmp-XX0gfqTC InputGraph::readFrom(): 10295018 unique from nodes InputGraph::read(): Read 15573032 11-mers from .vg-kmers-tmp-XX0gfqTC PathGraph::PathGraph(): 15573032 paths with 31146064 ranks PathGraph::PathGraph(): 0.464113 GB in 1 file(s) GCSA::GCSA(): Preprocessing: 10.0734 seconds, 4.05677 GB GCSA::GCSA(): Prefix-doubling from path length 11 GCSA::GCSA(): Step 1 (path length 11 -> 22) PathGraph::prune(): 15573032 -> 15518875 paths (3077248 ranges) PathGraph::prune(): 793893 unique, 0 redundant, 14724982 unsorted, 0 nondeterministic paths PathGraph::prune(): 0.462499 GB in 1 file(s) PathGraph::read(): File 0: Read 15518875 order-11 paths PathGraph::extend(): File 0: Created 28236257 order-22 paths PathGraph::read(): File 0: Read 28236257 order-22 paths PathGraphBuilder::sort(): File 0: Sorted 28236257 paths PathGraph::extend(): 15518875 -> 28236257 paths (83914878 ranks) PathGraph::extend(): 0.943737 GB in 1 file(s) GCSA::GCSA(): Step 2 (path length 22 -> 44) PathGraph::prune(): 28236257 -> 17773107 paths (15585140 ranges) PathGraph::prune(): 14761744 unique, 0 redundant, 3000924 unsorted, 10439 nondeterministic paths PathGraph::prune(): 0.592932 GB in 1 file(s) PathGraph::read(): File 0: Read 17773107 order-22 paths PathGraph::extend(): File 0: Created 22877942 order-44 paths PathGraph::read(): File 0: Read 22877942 order-44 paths PathGraphBuilder::sort(): File 0: Sorted 22877942 paths PathGraph::extend(): 17773107 -> 22877942 paths (83831103 ranks) PathGraph::extend(): 0.823657 GB in 1 file(s) GCSA::GCSA(): Prefix-doubling: 46.9834 seconds, 4.09481 GB GCSA::GCSA(): Merging the paths MergedGraph::MergedGraph(): 18772860 paths with 62904019 ranks and 843401 additional from nodes MergedGraph::MergedGraph(): 0.683993 GB GCSA::GCSA(): Merging: 9.54995 seconds, 4.09481 GB GCSA::GCSA(): Building the index GCSA::GCSA(): Construction: 21.8281 seconds, 4.09481 GB GCSA::GCSA(): 18772860 paths, 19816094 edges GCSA::GCSA(): 19616261 pointers (9321243 redundant) GCSA::GCSA(): 3594587 samples at 3165728 positions LCPArray::LCPArray(): Construction: 0.677416 seconds, 4.09481 GB LCPArray::LCPArray(): 19070845 values at 6 levels (branching factor 64) gi224183346|gb|GL000256.1|: adding to graph1 gi224183346|gb|GL000256.1|: aligning sequence of 4928568bp against 705332 nodes terminate called after throwing an instance of 'std::runtime_error' what(): Cannot simplify Mapping with no position; must update position when removing leading deletions ./msga-MHC-ref-nopath.sh: line 8: 27895 Aborted (core dumped)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/469#issuecomment-290171793, or mute the thread https://github.com/notifications/unsubscribe-auth/AE0_X6JAOcXKW4RCIQ1-abbBRo3DMQToks5rqpvygaJpZM4J3Wq2 .
It did not. Tried again with v1.5.0-31 and -B 10000
:
...
LCPArray::LCPArray(): Construction: 0.684936 seconds, 4.75183 GB
LCPArray::LCPArray(): 19059560 values at 6 levels (branching factor 64)
gi224183346|gb|GL000256.1|: adding to graph1
gi224183346|gb|GL000256.1|: aligning sequence of 4928568bp against 705195 nodes
gi224183346|gb|GL000256.1|: editing graph
graph path 'gi224183346|gb|GL000256.1|' invalid: edge from 723574 end to 723575 start does not exist
creating edge
graph path 'gi224183346|gb|GL000256.1|' invalid: edge from 723575 end to 723576 start does not exist
creating edge
...
graph path 'gi224183346|gb|GL000256.1|' invalid: edge from 744886 end to 744887 start does not exist
creating edge
gi224183346|gb|GL000256.1|: sorting and compacting ids
building xg index
building GCSA2 index
InputGraph::InputGraph(): 17784208 kmers in 1 file(s)
InputGraph::read(): Read 17784208 11-mers from .vg-kmers-tmp-XXQ6MpEC
InputGraph::readKeys(): 3109455 unique keys
InputGraph::read(): Read 17784208 11-mers from .vg-kmers-tmp-XXQ6MpEC
InputGraph::readFrom(): 12138204 unique from nodes
InputGraph::read(): Read 17784208 11-mers from .vg-kmers-tmp-XXQ6MpEC
PathGraph::PathGraph(): 17784208 paths with 35568416 ranks
PathGraph::PathGraph(): 0.530011 GB in 1 file(s)
GCSA::GCSA(): Preprocessing: 11.8333 seconds, 8.06233 GB
GCSA::GCSA(): Prefix-doubling from path length 11
GCSA::GCSA(): Step 1 (path length 11 -> 22)
PathGraph::prune(): 17784208 -> 17726985 paths (3103834 ranges)
PathGraph::prune(): 782911 unique, 0 redundant, 16944054 unsorted, 20 nondeterministic paths
...
GCSA::GCSA(): Prefix-doubling: 53.3279 seconds, 8.06233 GB
GCSA::GCSA(): Merging the paths
MergedGraph::MergedGraph(): 20687934 paths with 71009688 ranks and 2702625 additional from nodes
MergedGraph::MergedGraph(): 0.786482 GB
GCSA::GCSA(): Merging: 20.772 seconds, 8.06233 GB
GCSA::GCSA(): Building the index
GCSA::GCSA(): Construction: 40.9665 seconds, 8.06233 GB
GCSA::GCSA(): 20687934 paths, 22083495 edges
GCSA::GCSA(): 23390559 pointers (11252355 redundant)
GCSA::GCSA(): 6070783 samples at 3949188 positions
LCPArray::LCPArray(): Construction: 0.604448 seconds, 8.06233 GB
LCPArray::LCPArray(): 21016316 values at 6 levels (branching factor 64)
[vg msga] failed to include alignment, retrying
expected 000256.1| Homo sapiens chromosome 6 genomic contig, GRCh37 reference assembly alternate locus group ALT_REF_LOCI_7TCGGATTAAG...
got 000256.1| Homo sapiens chromosome 6 genomic contig, GRCh37 reference assembly alternate locus group ALT_REF_LOCI_7TCGGATTAAGTGC...
{"mapping": [{"position": {"node_id": 22391, "offset": 11}, "edit": [{"to_length": 114, "sequence": "000256.1| Homo sapiens chromosome 6 genomic ...
{"name": "gi224183346|gb|GL000256.1|", "mapping": [{"position": {"node_id": 13502}, "edit": [{"from_length": 22, "to_length": 22}], "rank": ...
[vg msga] Error: failed to include path gi224183346|gb|GL000256.1|
Looks like msga needs some attention. Sorry if I missed but is the input you are testing on available? I'd like to reproduce this.
On Sun, Apr 2, 2017, 8:23 PM Heather Gibling notifications@github.com wrote:
It did not. Tried again with v1.5.0-31 and -B 10000:
... LCPArray::LCPArray(): Construction: 0.684936 seconds, 4.75183 GB LCPArray::LCPArray(): 19059560 values at 6 levels (branching factor 64) gi224183346|gb|GL000256.1|: adding to graph1 gi224183346|gb|GL000256.1|: aligning sequence of 4928568bp against 705195 nodes gi224183346|gb|GL000256.1|: editing graph graph path 'gi224183346|gb|GL000256.1|' invalid: edge from 723574 end to 723575 start does not exist creating edge graph path 'gi224183346|gb|GL000256.1|' invalid: edge from 723575 end to 723576 start does not exist creating edge ... graph path 'gi224183346|gb|GL000256.1|' invalid: edge from 744886 end to 744887 start does not exist creating edge gi224183346|gb|GL000256.1|: sorting and compacting ids building xg index building GCSA2 index InputGraph::InputGraph(): 17784208 kmers in 1 file(s) InputGraph::read(): Read 17784208 11-mers from .vg-kmers-tmp-XXQ6MpEC InputGraph::readKeys(): 3109455 unique keys InputGraph::read(): Read 17784208 11-mers from .vg-kmers-tmp-XXQ6MpEC InputGraph::readFrom(): 12138204 unique from nodes InputGraph::read(): Read 17784208 11-mers from .vg-kmers-tmp-XXQ6MpEC PathGraph::PathGraph(): 17784208 paths with 35568416 ranks PathGraph::PathGraph(): 0.530011 GB in 1 file(s) GCSA::GCSA(): Preprocessing: 11.8333 seconds, 8.06233 GB GCSA::GCSA(): Prefix-doubling from path length 11 GCSA::GCSA(): Step 1 (path length 11 -> 22) PathGraph::prune(): 17784208 -> 17726985 paths (3103834 ranges) PathGraph::prune(): 782911 unique, 0 redundant, 16944054 unsorted, 20 nondeterministic paths ... GCSA::GCSA(): Prefix-doubling: 53.3279 seconds, 8.06233 GB GCSA::GCSA(): Merging the paths MergedGraph::MergedGraph(): 20687934 paths with 71009688 ranks and 2702625 additional from nodes MergedGraph::MergedGraph(): 0.786482 GB GCSA::GCSA(): Merging: 20.772 seconds, 8.06233 GB GCSA::GCSA(): Building the index GCSA::GCSA(): Construction: 40.9665 seconds, 8.06233 GB GCSA::GCSA(): 20687934 paths, 22083495 edges GCSA::GCSA(): 23390559 pointers (11252355 redundant) GCSA::GCSA(): 6070783 samples at 3949188 positions LCPArray::LCPArray(): Construction: 0.604448 seconds, 8.06233 GB LCPArray::LCPArray(): 21016316 values at 6 levels (branching factor 64) [vg msga] failed to include alignment, retrying expected 000256.1| Homo sapiens chromosome 6 genomic contig, GRCh37 reference assembly alternate locus group ALT_REF_LOCI_7TCGGATTAAG... got 000256.1| Homo sapiens chromosome 6 genomic contig, GRCh37 reference assembly alternate locus group ALT_REF_LOCI_7TCGGATTAAGTGC... {"mapping": [{"position": {"node_id": 22391, "offset": 11}, "edit": [{"to_length": 114, "sequence": "000256.1| Homo sapiens chromosome 6 genomic ... {"name": "gi224183346|gb|GL000256.1|", "mapping": [{"position": {"node_id": 13502}, "edit": [{"from_length": 22, "to_length": 22}], "rank": ... [vg msga] Error: failed to include path gi224183346|gb|GL000256.1|
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/469#issuecomment-291004574, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI4EQadlrYqqjiBiLJrJ8Kr7V0Im4kZks5rr-emgaJpZM4J3Wq2 .
A similar output can be reproduced with the following:
vcf: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr6.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
ref: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz
haplotypes: http://hgwdev.sdsc.edu/~anovak/hgvm/MHC/
cat ref.fa GI* > ref-alt.fa
vg construct -R 6:28477797-33448354 -t 8 -m 32 -f -v ALL.chr6.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz -r human_g1k_v37.fasta > MHC-ref.vg
vg mod -D MHC-ref.vg > MHC-ref-nopath.vg
vg msga -f ref-alt.fa -g MHC-ref-nopath.vg -B 5000 -K 11 -X 2 -z -D -t 8 > MHC-ref-haps.vg
I've mapped a bam file to my MHC graph and used
vg call
to produce a vcf file of variants, but I'm not sure how to interpret it. Many positions have multiple entries, and some of the IDs are duplicated. For example, position 30052145 has 23 entries (the first column is the number of line occurrences fromgrep 30052145 file.vcf | uniq -c
):First, what are the IDs (837884_837817_837813_837811_1048822_1048819)? I initially thought they were node IDs, but they don't seem to exist when I try
vg find -n 837884 -c 2 -x graph.xg | vg view -
(I get anAssertion
idx < this->size()' failed.` error, which I've gotten before when searching for a non-existent node).Second, should the same position, ID, ref and alt be listed more than once? I haven't yet managed to get decomposing block subs or normalization through
vt
to work yet, so I'm not sure if this would clear up some of the duplications.(Relatedly, the readme says
vt
is found at http://varianttools.sourceforge.net/Association/HomePage, but I'm assuming this should be http://genome.sph.umich.edu/wiki/Vt ?)