Open thekswenson opened 1 year ago
Hi @thekswenson,
Thank you very much for reporting this issue. I included your toy datasets in the data/toy
folder I was able to reproduce the issue.
The count
issue was resolved in https://github.com/maxrossi91/r-index/commit/87e59bf2fb0124cd95542db880325abfada67e34 . The count
issue was due to the fact that the query sequences were lowercase, while the index expects them uppercase. This was not an issue with the locate, and it was resolved by uppercasing the query. The correct output is reported below.
query1 5/5 1
query2 4/4 4
Total time: 0.00016s
The reporting of bwtparse
refer to the PFP components, and it seems to be correct since the PFP has only one phrase.
The locate
had another issue, crashing when the query sequence was found at the beginning of a sequence in the index. This was also resolved. Also fixed in https://github.com/maxrossi91/r-index/commit/87e59bf2fb0124cd95542db880325abfada67e34
The locate issue you reported seems to be local to the bigbwt
construction, since with the sais
construction algorithm the result is correct, i.e., when building the index with ri-buildfasta -b sais data/toy/small.fa
@HD VN:1.6 SO:unknown
@SQ SN:seq1 LN:11
@SQ SN:seq2 LN:11
query1 0 seq1 4 255 5M * 0 0 TAAAA * NH:i:1
query2 0 seq2 1 255 4M * 0 0 acgt * NH:i:4
query2 256 seq1 8 255 4M * 0 0 * * NH:i:4
query2 256 seq1 1 255 4M * 0 0 * * NH:i:4
query2 256 seq2 8 255 4M * 0 0 * * NH:i:4
Total time: 0.00014s
This issue with bigbwt
is likely to be a corner case for small inputs when the dictionary is composed of only one phrase.
I will continue the investigation.
Thanks @maxrossi91 !
Build an index on this toy input:
This step ends without error. I don't know if the log is fishy or not, though. It first says
Total input symbols: 22
, which is right.But then the
bwtparse
step sayswhich sounds strange.
When I use this query
I get this
which appears to find no matches?
The locate command also gives unexpected output: