samtools / htslib

C library for high-throughput sequencing data formats
Other
785 stars 447 forks source link

Tabix does not return the correct regions #1622

Closed hiruna72 closed 1 year ago

hiruna72 commented 1 year ago

Hi developers,

Let's say I have the following file

S1  5   1   R1
S1  15  5   R2

I create a tabix index like follows

bgzip table.txt --keep -f
tabix -0 -b 3 -e 2 -s 1 table.txt.gz -f

Then I query the file

$ tabix table.txt.gz S1
S1  5   1   R1
S1  15  5   R2

$ tabix table.txt.gz S1:1-5
S1  5   1   R1

$ tabix table.txt.gz S1:2-5
S1  5   1   R1

$ tabix table.txt.gz S1:3-5
no records were printed. should have printed (S1    5   1   R1)

Similarly

$ tabix table.txt.gz S1:5-15
S1  15  5   R2

$ tabix table.txt.gz S1:6-15
S1  15  5   R2

$ tabix table.txt.gz S1:7-15
no records were printed. should have printed (S1    15  5   R2)

As shown above in both cases tabix did not print the valid records in the given region.

Thank you.

kisarur commented 1 year ago

I could reproduce the same issue. However, if columns 2 and 3 are swapped as below (table2.txt), I could get the expected the results. I used the latest commit in develop branch. I assume the problem is in the indexing part (tabix).

S1  1   5   R1
S1  5   15  R2

Then you create a tabix index

$ bgzip table2.txt --keep -f
$ tabix -0 -b 2 -e 3 -s 1 table2.txt.gz -f

Then query

$ tabix table2.txt.gz S1:3-5
S1  1   5   R1
$ tabix table2.txt.gz S1:7-15
S1  5   15  R2
whitwham commented 1 year ago

It looks like that bug has been there since the beginning.