Closed mnikhil-git closed 8 years ago
@svent Is it possible for you to share the sample data that you have used for benchmarking this and may be the hardware specs, perhaps any scripts to benchmark this? I would like to benchmark against the backdrop of current change.
Could you provide a benchmark?
I will need help from @svent for this
Sorry for the late reply on this PR - I finally did some benchmarks on this.
Searching through 800 small .gz files (800 files, 200 MB uncompressed):
Searching one big file (700 MB uncompressed):
So the PR acutally makes sift slower. This is not because that library is bad (I guess one can find examples where the performance is slightly better) - one reason is that sift is already designed for an optimal balance of CPU and IO load, and especially searching files in parallel cannot benefit from this as sift uses all CPU cores in that case anyway (and using pgzip just adds additional complexity). sift is just not a good use case for that parallel gzip implementation.
instead of stdlib gzip for parallel read