This PR introduces an index structure for free buffers of a slab, this allows keep data fetching time pretty much constant.
Example: 1000 fragments, 15 columns, (we observe GPU as it is easier to measure fetching time).
Problem: current buffer manager looks for a fitting buffer by walking through all buffers in all slabs. Normally this means that we go through all previously allocated buffers until the last one. When we fetch columns one by one (for simplicity), the measurements per column fetch are:
[102ms, 105ms, 125ms, 205ms, 301ms, 416ms, ..., 1100ms, ...](i.e., 1st column took 102ms to fetch, 2nd 105ms, etc.).
Solution: introduce an index structure for free buffers in a slab.
Effect: we get a quick shortcut over slabs where we wouldn't fit, and we don't have to iterate over all buffers in a slab, only over the free ones and in log time. In practice the overhead is eliminated:
[102ms, 103ms, 98ms, 100ms, ..., 101ms].
Drawback: slightly higher overhead per buffer, but it is maybe in the order of a few ms for 1000 buffers.
This PR introduces an index structure for free buffers of a slab, this allows keep data fetching time pretty much constant.
Example: 1000 fragments, 15 columns, (we observe GPU as it is easier to measure fetching time).
Problem: current buffer manager looks for a fitting buffer by walking through all buffers in all slabs. Normally this means that we go through all previously allocated buffers until the last one. When we fetch columns one by one (for simplicity), the measurements per column fetch are:
[102ms, 105ms, 125ms, 205ms, 301ms, 416ms, ..., 1100ms, ...]
(i.e., 1st column took 102ms to fetch, 2nd 105ms, etc.).Solution: introduce an index structure for free buffers in a slab.
Effect: we get a quick shortcut over slabs where we wouldn't fit, and we don't have to iterate over all buffers in a slab, only over the free ones and in log time. In practice the overhead is eliminated:
[102ms, 103ms, 98ms, 100ms, ..., 101ms]
.Drawback: slightly higher overhead per buffer, but it is maybe in the order of a few ms for 1000 buffers.