Here's the entry in the kmer precalc file
4 4 4 |1|19 22 22 23 |18 24 25 27 ||5 @|
[19,22) corresponds to the correct interval in the BWM starting with 444=TTT
[22,23) is the interval caused by T5...5TT
Now, as I step through, starting at 444, thing make sense:
p kmer
$12 = std::vector of length 3, capacity 3 = {4 '\004', 4 '\004', 4 '\004\'}
p sa_intervals
$13 = std::list = {[0] = {first = 19, second = 22}, [1] = {first = 22, second = 23}}
Then we step into bidir_search_bwd from map.cpp, we have kmer=444 and c=1
As we step in here
while (it!=sa_intervals.end() && it_rev!=sa_intervals_rev.end() && it_s!=sites.end()) { //calculate sum to return- can do this in top fcns => if (bidir_search(csa,(*it).first,(*it).second,(*it_rev).first,(*it_rev).second,c)>0)
we get into bidir_search.cpp with input args (left, right,c) = (19,22,1)
We get c_begin=1 - makes sense
Then:
//rank_l is the number of times c occurs in BWT[0,left] - number of 1s in [0,19]
(gdb) p rank_l
$19 = 1 << correct, look at RH column of BWM above
// s is total number of times
// chars smaller than c occur in BWT[left,right]
(gdb) p s
$20 = 0 <<<< correct
// b is total number of times
// chars bigger than c occur in BWT[left,right]
(gdb) p b
$21 = 3 <<< correct
Now this:
//rank_r is the number of times c occurs in BWT[0,right]
uint64_t rank_r = right - left - s - b + rank_l - 1;
p rank_r |
$28 = 0
If we look at RH column the BWM, this should be 1 - it certainly can't be less than rank_l which was 1
That I think must be wrong.
As a result of this
left = c_begin + rank_l;
(gdb) p left
$29 = (uint64_t &) @0x69f8f0: 2
right = c_begin + rank_r + 1;
(gdb) p right
$30 = (uint64_t &) @0x69f8f8: 2
This is because rank_r<rank_l
Back in bidir_search_bwd.cpp, we return here
while (it!=sa_intervals.end() && it_rev!=sa_intervals_rev.end() && it_s!=sites.end()) {
//calculate sum to return- can do this in top fcns
if (bidir_search(csa,(*it).first,(*it).second,(*it_rev).first,(*it_rev).second,c)>0) {
++it;
++it_rev;
++it_s;
}
That if condition fails, because bidir_search returns right-left=0.
So we never increment the iterator. We now have iterator.first=2, iterator.second=2 (left and right),
which we then erase
it=sa_intervals.erase(it);
And from then on there's no way the read can map. The next loop round, we start on the sa_interval [22,23) which is the false one from T5...5TT, which doesn't extend to allow the read to map.
Again we go into bidir_search
rank_l=1
rank_r=0
s=0
b=1
and that results in right=left=2
PRG: ATCGCT5CCGCCGGCGA6G5TTTTT
BWM: $1423245223223323163544444 1423245223223323163544444$ 163544444$1423245223223323 223223323163544444$1423245 223323163544444$1423245223 23163544444$14232452232233 23223323163544444$14232452 23245223223323163544444$14 23323163544444$14232452232 245223223323163544444$1423 3163544444$142324522322332 3223323163544444$142324522 323163544444$1423245223223 3245223223323163544444$142 3323163544444$142324522322 3544444$142324522322332316 4$142324522322332316354444 423245223223323163544444$1 44$14232452232233231635444 444$1423245223223323163544 19=left 4444$142324522322332316354 44444$14232452232233231635
45223223323163544444$14232 22=right 5223223323163544444$142324 544444$1423245223223323163 63544444$14232452232233231
read which should map, but which does not:
Here's the entry in the kmer precalc file 4 4 4 |1|19 22 22 23 |18 24 25 27 ||5 @|
[19,22) corresponds to the correct interval in the BWM starting with 444=TTT [22,23) is the interval caused by T5...5TT
Now, as I step through, starting at 444, thing make sense:
p kmer
$12 = std::vector of length 3, capacity 3 = {4 '\004', 4 '\004', 4 '\004\'} p sa_intervals $13 = std::list = {[0] = {first = 19, second = 22}, [1] = {first = 22, second = 23}}
Then we step into bidir_search_bwd from map.cpp, we have kmer=444 and c=1
As we step in here
while (it!=sa_intervals.end() && it_rev!=sa_intervals_rev.end() && it_s!=sites.end()) { //calculate sum to return- can do this in top fcns => if (bidir_search(csa,(*it).first,(*it).second,(*it_rev).first,(*it_rev).second,c)>0)
we get into bidir_search.cpp with input args (left, right,c) = (19,22,1)
We get c_begin=1 - makes sense Then:
(gdb) p rank_l
$19 = 1 << correct, look at RH column of BWM above
(gdb) p s
$20 = 0 <<<< correct
(gdb) p b
$21 = 3 <<< correct
Now this:
p rank_r | $28 = 0
If we look at RH column the BWM, this should be 1 - it certainly can't be less than rank_l which was 1
That I think must be wrong.
As a result of this
left = c_begin + rank_l;
(gdb) p left $29 = (uint64_t &) @0x69f8f0: 2
right = c_begin + rank_r + 1;
(gdb) p right $30 = (uint64_t &) @0x69f8f8: 2
This is because rank_r<rank_l
Back in bidir_search_bwd.cpp, we return here
That if condition fails, because bidir_search returns right-left=0. So we never increment the iterator. We now have iterator.first=2, iterator.second=2 (left and right), which we then erase it=sa_intervals.erase(it);
And from then on there's no way the read can map. The next loop round, we start on the sa_interval [22,23) which is the false one from T5...5TT, which doesn't extend to allow the read to map. Again we go into bidir_search rank_l=1 rank_r=0 s=0 b=1 and that results in right=left=2
.. etc - the read fails to map.