iqbal-lab-org / gramtools

Genome inference from a population reference graph
MIT License
92 stars 15 forks source link

Underflowing a uint64_t leads to unexpected lack of problems #23

Closed iqbal-lab closed 8 years ago

iqbal-lab commented 8 years ago

PRG is A5C6G5T Integer encoding is 1526354$

BWM is $1526354 1526354$ 26354$15 354$1526 4$152635 526354$1 54$15263 6354$152

Now, am stepping through the precalc kmers step, for the kmer AC. Start with C==2, and you get the interval [2,3). So far so good. Then try AC and step into bidir_search()

  uint64_t bidir_search(csa_wt<wt_int<bit_vector,rank_support_v5<>>,2,16777216> &csa,
                        uint64_t& left, uint64_t& right,
                        uint64_t& left_rev, uint64_t& right_rev,
                        uint8_t c)

c is 1. left=2, right=3 So far so good.

// c_begin (below) is the first occurrence/posn // of char c in the far left column // of the BW matrix B uint64_t c_begin = csa.C[csa.char2comp[c]];

c_begin =1. This makes sense - look at the BWM.

Now we get here

//rank_l is the number of times c occurs in BWT[0,left] uint64_t rank_l = std::get<0>(r_s_b); // s is total number of times // chars smaller than c occur in BWT[left,right] uint64_t s = std::get<1>(r_s_b); // b is total number of times // chars bigger than c occur in BWT[left,right] uint64_t b = std::get<2>(r_s_b);

rank_l = 0 s=0 b=1 Comparing the BWM, this all looks ok. However the following fails

//rank_r is the number of times c occurs in BWT[0,right] uint64_t rank_r = right - left - s - b + rank_l - 1;

look at what we get:

(gdb) p rank_r $65 = 18446744073709551615

Look again at the definition of rank_r

(gdb) p right - left - s - b + rank_l - 1
$66 = 18446744073709551615
(gdb) p right - left - s - b + rank_l
$67 = 0

So we are trying to subtract 1 from a uint64_t zero.

What does this lead to?

//get new interval, after appending c left = c_begin + rank_l; right = c_begin + rank_r + 1; which gives us

(gdb) p left
$72 = (uint64_t &) @0x7ffff0000a10: 1
(gdb) p right
$73 = (uint64_t &) @0x7ffff0000a18: 1

iqbal-lab commented 8 years ago

To summarise, bidir_search returns this interval [1,1) which seems wrong/undefined.

iqbal-lab commented 8 years ago

I think this is a valid issue, but I'm going to close it and raise it again with a simple example of where it causes a read to not map