vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.12k stars 194 forks source link

vg/include/sdsl/int_vector.hpp:1396 error #364

Open hgibling opened 8 years ago

hgibling commented 8 years ago
vg: path/to/vg/include/sdsl/int_vector.hpp:1396: sdsl::int_vector<<anonymous> >::const_reference sdsl::int_vector<<anonymous> >::operator[](const
 size_type&) const [with unsigned char t_width = 0u; sdsl::int_vector<<anonymous> >::const_reference = const long unsigned int; sdsl::int_vector<<anonymous> >
::size_type = long unsigned int]: Assertion `idx < this->size()' failed.

I've gotten this error a couple of times. Most recently using vg find -n 1 -c 5 -x graph.xg: the graph itself isn't huge

nodes   154
edges   151
length  3360

and I've successfully run that command on a much larger graph. The other time was using vg msga, though I don't remember the exact command I used. I'm requesting plenty of memory when I submit the command on my cluster.

Thoughts on what might be going on?

ekg commented 8 years ago

Can you share the xg index or input graph? I'll take a look.

On Fri, May 27, 2016, 21:36 Heather Gibling notifications@github.com wrote:

vg: path/to/vg/include/sdsl/int_vector.hpp:1396: sdsl::int_vector< >::const_reference sdsl::int_vector< >::operator const [with unsigned char t_width = 0u; sdsl::int_vector< >::const_reference = const long unsigned int; sdsl::int_vector< > ::size_type = long unsigned int]: Assertion `idx < this->size()' failed.

I've gotten this error a couple of times. Most recently using vg find -n 1 -c 5 -x graph.xg: the graph itself isn't huge

nodes 154 edges 151 length 3360

and I've successfully run that command on a much larger graph. The other time was using vg msga, though I don't remember the exact command I used. I'm requesting plenty of memory when I submit the command on my cluster.

Thoughts on what might be going on?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/364, or mute the thread https://github.com/notifications/unsubscribe/AAI4EWttHXh6ivFG6K_jO101riRSfbDsks5qF1XOgaJpZM4Io2Fg .

hgibling commented 8 years ago

Sure, here are both.

prdm9-ABC.zip

ekg commented 8 years ago

I just noticed that the graph has lower case DNA bases. vg and other tools aren't handling them gracefully now. Do you have a version with upper case bases? I could also write a converter.

On Mon, May 30, 2016 at 4:54 PM Heather Gibling notifications@github.com wrote:

Sure, here are both.

prdm9-ABC.zip https://github.com/vgteam/vg/files/289836/prdm9-ABC.zip

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/364#issuecomment-222519341, or mute the thread https://github.com/notifications/unsubscribe/AAI4EShDi2HTbDxISKPxb0CxnMGqIdxeks5qGwgfgaJpZM4Io2Fg .

ekg commented 8 years ago

I can convert it via GFA, one second and I'll see if this is the problem.

On Mon, May 30, 2016 at 5:11 PM Erik Garrison erik.garrison@gmail.com wrote:

I just noticed that the graph has lower case DNA bases. vg and other tools aren't handling them gracefully now. Do you have a version with upper case bases? I could also write a converter.

On Mon, May 30, 2016 at 4:54 PM Heather Gibling notifications@github.com wrote:

Sure, here are both.

prdm9-ABC.zip https://github.com/vgteam/vg/files/289836/prdm9-ABC.zip

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/364#issuecomment-222519341, or mute the thread https://github.com/notifications/unsubscribe/AAI4EShDi2HTbDxISKPxb0CxnMGqIdxeks5qGwgfgaJpZM4Io2Fg .

ekg commented 8 years ago

That wasn't the problem. The issue is that the graph has no node 1. It starts from 51.

On Mon, May 30, 2016 at 5:13 PM Erik Garrison erik.garrison@gmail.com wrote:

I can convert it via GFA, one second and I'll see if this is the problem.

On Mon, May 30, 2016 at 5:11 PM Erik Garrison erik.garrison@gmail.com wrote:

I just noticed that the graph has lower case DNA bases. vg and other tools aren't handling them gracefully now. Do you have a version with upper case bases? I could also write a converter.

On Mon, May 30, 2016 at 4:54 PM Heather Gibling notifications@github.com wrote:

Sure, here are both.

prdm9-ABC.zip https://github.com/vgteam/vg/files/289836/prdm9-ABC.zip

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/364#issuecomment-222519341, or mute the thread https://github.com/notifications/unsubscribe/AAI4EShDi2HTbDxISKPxb0CxnMGqIdxeks5qGwgfgaJpZM4Io2Fg .

ekg commented 8 years ago

vg view prdm9-ABC.vg | sed 'y/atgc/ATGC/' | vg view -Fv - >prdm9-ABC.vg.1

vg index -x prdm9-ABC.xg prdm9-ABC.vg.1

vg find -n 51 -c 5 -x prdm9-ABC.xg | vg view -

Gives me:

H VN:Z:1.0 S 51 TGTGGACAAGGTTTCAGTGTTA P 51 A 1 + 22M L 51 + 52 + 0M S 52 AATCAGATGTTATTACACACCA P 52 A 2 + 22M L 52 + 53 + 0M S 53 AAGGACACATACAGGGGAGAAG P 53 A 3 + 22M L 53 + 54 + 0M S 54 CTCTACGTCTGCAGGGAGTGTG P 54 A 4 + 22M L 54 + 55 + 0M S 55 GGCGGGGCTTTAGCTGGAAGTC P 55 A 5 + 22M L 55 + 56 + 0M S 56 ACACCTCCTCATTCACCAGAGG P 56 A 6 + 22M

ekg commented 8 years ago

So the bug here is that we should check if we're in bounds in the node space and if not throw an error that actually describes the problem.

hgibling commented 8 years ago

Ahh I forgot to convert the fasta to uppercase. Converting and then rebuilding the graph with msga results in the graph starting at node 1. Thanks for the help!

ekg commented 8 years ago

You can also use vg mod -c to compact the id space after whatever construction or import process you use.

hgibling commented 8 years ago

I've been using vg ids -s. Is one recommended over the other?

adamnovak commented 8 years ago

Either one of those commands will work, but vg ids -s also reorders nodes by a topological sort.

@ekg have we changed xg to work when your graph doesn't start at 1? Or to complain in that case with something better?

ksnv commented 6 years ago

Hi,

I followed the link [https://github.com/vgteam/vg/wiki/working-with-a-whole-genome-variation-graph] for the construction of gcsa and xg indices on whole genome graph.

GCSA index succesfully got created and I used the following commands for the same

for chr in $(seq 1 22; echo X; echo Y);
do
    vg mod -t 32 -pl 16 -S -t 16 -e 4 $chr.vg >$chr.prune.vg
    vg mod -t 32 -N $chr.vg >$chr.ref.vg
    cat $chr.ref.vg $chr.prune.vg | vg view -v -D - 2>$chr.merge.err >$chr.smooth.vg
    vg kmers -gBk 16 -H 1000000000 -T 1000000001 $chr.smooth.vg >$chr.graph
done

But I got the following error when I am trying to build XG index.

vg/include/sdsl/int_vector.hpp:1360: sdsl::int_vector<<anonymous> >::reference sdsl::int_vector<<anonymous> >::operator[](const size_type&) [with unsigned char t_width = 0u; sdsl::int_vector<<anonymous> >::reference =sdsl::int_vector_reference<sdsl::int_vector<0u>>; sdsl::int_vector<<anonymous> >::size_type = long unsigned int]: Assertionidx < this->size()' failed.`

For xg index i used the following command vg index -x wg.xg $(for i in $(seq 22; echo X; echo Y); do echo $i.vg; done)

Can you please correct my understanding and help to figure what's going wrong here ?