Ruby's Zlib::GZipReader#each_line is notoriously slow, so while it is capable of maintaining a steady memory profile by "streaming" one line at a time, in practice the overhead is painful.
As a stop-gap measure, this PR intoduces a new config parameter gzip_prefer, whose default value is memory, but can be set to speed.
When memory is selected, it behaves as it always has: each decoded line is yielded by Zlib::GzipReader#each_line.
When speed is selected, each gzip chunk is read entirely into a single string with Zlib::GZipReader#read, and then yielded line-by-line with String#each_line; this routes around the problem of Zlib::GzipReader#each_line at the cost of the predictability of memory consumption.
This should alleviate the pain of the following tickets:
Ideally, we could be using a combination of java.util.zip.GZipInputStream and java.io.BufferedReader to do the bulk of the work on that Java-side of JRuby, but that work would present more risk and require more extensive testing.
Ruby's
Zlib::GZipReader#each_line
is notoriously slow, so while it is capable of maintaining a steady memory profile by "streaming" one line at a time, in practice the overhead is painful.As a stop-gap measure, this PR intoduces a new config parameter
gzip_prefer
, whose default value ismemory
, but can be set tospeed
.memory
is selected, it behaves as it always has: each decoded line is yielded byZlib::GzipReader#each_line
.speed
is selected, each gzip chunk is read entirely into a single string withZlib::GZipReader#read
, and then yielded line-by-line withString#each_line
; this routes around the problem ofZlib::GzipReader#each_line
at the cost of the predictability of memory consumption.This should alleviate the pain of the following tickets:
Ideally, we could be using a combination of
java.util.zip.GZipInputStream
andjava.io.BufferedReader
to do the bulk of the work on that Java-side of JRuby, but that work would present more risk and require more extensive testing.