Closed TuBieJun closed 10 months ago
I've tried and cannot reproduce this. Infact 1.18 is faster than 1.16. Were all your binaries built with the same compiler and compiler version, with the same options?
Also, are you using -T or -R for queries? Have you tested speeds on both? At some point -T becomes faster than -R (where the density of hits becomes sufficient). This feels wrong, and it ought to asymptotically approach instead of passing. It implies the index jumping option (-R) is unnecessarily decoding things multiple times. We fixed this in the multi-region iterator for SAM and BAM, but I guess tabix has its own iterators. That's a different issue though and not related to 1.16 vs 1.18.
Alright, I'll go and confirm the versions and parameters of the compilation tools for versions 1.16 and 1.18. Anyway, thx!
Hi,I have observed a phenomenon where Tabix version 1.18 is slower in querying locus information compared to version 1.16. Here are my command and benchmark result:
Repeat this many times and the result is the same:![image](https://github.com/samtools/htslib/assets/15261087/c52da55d-ae77-484c-aceb-426885c28d05)
The command of creating index is:
My file look like this:
![image](https://github.com/samtools/htslib/assets/15261087/813a4793-88dc-4245-800c-0b02fd89aaf0)
And the query region file look like this:![image](https://github.com/samtools/htslib/assets/15261087/66eb7522-7f5b-499b-8c97-70610dbe5d09)
Is it due to my improper usage or are there some unknown issues with version 1.18? The background for conducting this benchmark is that we aim to utilize Tabix and BCFtools to develop a cloud-based application that can efficiently retrieve user genotypes based on rsID from BCF files. We are somewhat sensitive to this performance difference.