ryanlayer / giggle

Interval data structure
MIT License
226 stars 29 forks source link

extra (false positive) overlaps issue #51

Open pkuksa opened 4 years ago

pkuksa commented 4 years ago

Extra (false positive) overlaps are reported by giggle search:

Steps to reproduce:

db.bed: chr1 207799860 207799861 chr1 207799861 207799862 chr1 207799869 207799871 chr1 207799877 207799878 chr1 207799878 207799879

query.bed: chr1 207799861 207799878

bgzip db.bed bgzip query.bed giggle index -i db.bed.gz -o db_index

giggle search -i db_index -q query.bed.gz

db.bed.gz size:5 overlaps:5

giggle search -i db_index -q query.bed.gz -v -o

Giggle search output: 5 overlaps

chr1 207799861 207799878

chr1 207799860 207799861 db.bed.gz chr1 207799861 207799862 db.bed.gz chr1 207799869 207799871 db.bed.gz chr1 207799877 207799878 db.bed.gz chr1 207799878 207799879 db.bed.gz

Extra (false positive) overlaps: 2 overlaps (1 upstream, 1 downstream of the query interval) chr1 207799860 207799861 db.bed.gz chr1 207799878 207799879 db.bed.gz

Expected (correct) output for overlap between db.bed and query.bed: 3 overlaps chr1 207799861 207799862 chr1 207799869 207799871 chr1 207799877 207799878

The extra overlaps are reported for both left and right interval boundaries. Attached are db and query files used for running.

db.bed.gz query.bed.gz

Thanks!