Increase the input block size for bgzip.

jkbonfield commented 2 months ago

Commit e495718 changed bgzip from unix raw POSIX read() calls to hread(). Unfortunately hread gets its buffer size from stat of the input file descriptor, which can be 4kb for a pipe. We're reading 0xff00 bytes, so this ends up being split over two reads mostly, with one or both involving additional memcpys. This makes the buffered I/O worse performing than non-buffered. In the most extreme cases (cat data | bgzip -l0 > /dev/null) this is a two fold slow down.

The easy solution is just to increase the buffer size to something sensible. It's a little messy as we have to use hfile_internal.h to get hfile_set_blksize, but it works. I'm not sure why we didn't elect to make that API more public. Probably simply out of caution.

Fixes #1767

jkbonfield commented 2 months ago

Edited the buffer down to 256Kb instead of 1Mb as it seems to be sufficient (tested on tmpfs, fast NFS and lustre).

jkbonfield commented 2 months ago

Hmm, it's still variable! The effect of a bigger block size is more memcpy as we have fewer direct reads (as @daviesrob points out a readv can partially solve that as we can read direct to caller buff + remainder to look-ahead cache, but it doesn't fit with the backend semantics), but it also reduces the impact of many small reads due to small pipe size.

Hence some machines it's slower to have a bigger block size, and it can have weird interactions with CPU load too which I cannot explain. Eg it's a win at -l0, but a penalty at -l1. It needs more head scratching probably.

samtools / htslib

Increase the input block size for bgzip. #1768