Performance degradation on large tables is the mind killer

In db:push, the calculate_chunksize block includes the stream.fetch call. In the case where the stream is a DB (the common case, if not all) this means that increasing seek time for larger offsets throwing off the chunksize calculation, resulting in ever-smaller chunks, until you're doing a whole lot of seeking and very little transmitting.

Solved by taking the stream.fetch call out of the chunksize calculation block, which means the newly-calculated chunksize is used on the next operation, rather than the current one, but then again only includes transmit time, rather than seek time. Thus, the chunksize calculator does its job, which is (I guess) reacting to varying network connectivity to avoid timeouts from overly-large chunksizes.

The change might need some tweaks, but AFAICT the gist is right.

ricardochimal / taps

Performance degradation on large tables is the mind killer #61