maxrossi91 / r-index

An optimal space run-length Burrows-Wheeler transform full-text index
GNU General Public License v3.0
3 stars 3 forks source link

Missing matches for toy example #1

Open thekswenson opened 1 year ago

thekswenson commented 1 year ago

Build an index on this toy input:

=> cat test/fake0.fasta
>seq1
acgtaaaacgt
>seq2
acgtataacgt

This step ends without error. I don't know if the log is fishy or not, though. It first says Total input symbols: 22, which is right.

But then the bwtparse step says

Parse file contains 1 words
Computing SA of size 2 over an alphabet of size 2

which sounds strange.

When I use this query

=> cat test/fake0_READS.fa
>query1
taaaa
>query2
acgt

I get this

=> rindex/bin/ri-align count test/fake0.fasta test/fake0_READS.fa
query1    0/5    23
query2    0/4    23
Total time: 0.01604s

which appears to find no matches?

The locate command also gives unexpected output:

=> rindex/bin/ri-align locate test/fake0.fasta test/fake0_READS.fa
@HD    VN:1.6    SO:unknown
@SQ    SN:seq1    LN:11
@SQ    SN:seq2    LN:11
query1    0    seq2    5    255    5M    *    0    0    taaaa    *NH:i:1
query2    0    seq1    12    255    4M    *    0    0    acgt    *NH:i:1
Total time: 0.00009s
maxrossi91 commented 1 year ago

Hi @thekswenson,

Thank you very much for reporting this issue. I included your toy datasets in the data/toy folder I was able to reproduce the issue.

I will continue the investigation.

thekswenson commented 1 year ago

Thanks @maxrossi91 !