ricardochimal / taps

simple database import/export app
MIT License
1.2k stars 140 forks source link

Performance degradation on large tables is the mind killer #61

Closed Empact closed 13 years ago

Empact commented 13 years ago

In db:push, the calculate_chunksize block includes the stream.fetch call. In the case where the stream is a DB (the common case, if not all) this means that increasing seek time for larger offsets throwing off the chunksize calculation, resulting in ever-smaller chunks, until you're doing a whole lot of seeking and very little transmitting.

Solved by taking the stream.fetch call out of the chunksize calculation block, which means the newly-calculated chunksize is used on the next operation, rather than the current one, but then again only includes transmit time, rather than seek time. Thus, the chunksize calculator does its job, which is (I guess) reacting to varying network connectivity to avoid timeouts from overly-large chunksizes.

The change might need some tweaks, but AFAICT the gist is right.

ricardochimal commented 13 years ago

this should do the trick https://github.com/ricardochimal/taps/commit/c7debaaa7558ec6ca6c3f50a95c2cba77fa0f480