tarantool / vshard

The new generation of sharding based on virtual buckets
Other
100 stars 30 forks source link

Optimize bucket_ref() to not make _bucket:get() when _bucket does not change #285

Open Gerold103 opened 3 years ago

Gerold103 commented 3 years ago

bucket_ref() is used by storage_call() which is used by all the routers most of the time. It has an optimization that if there is already > 0 refs of the requested type, it works as an increment. But if the current ref count is 0, it makes _bucket:get() to check bucket state. That might be expensive - a tuple is returned to Lua as cdata, which increases GC pressure. Also it might be relatively slow if bucket count is millions.

Instead, bucket_ref() could proceed to the ref increment even if its current ref count is 0. For that the _bucket space's trigger could remove the ref of the changed bucket if its ref count is 0. Then, if bucket_ref() already sees a ref object, it can work with it without touching _bucket. If there is no a ref object, then work like now - get a tuple from _bucket, ensure it is suitable for the ref, create the ref object, etc.

Another option would be to save bucket generation into each bucket_ref object when it was changed last time. If on a next ref the generation is the same as the global one - nothing changed, can do the ref. The problem is that it will add +8 bytes for each ref (expensive on huge bucket count), and might not work well with hot reload.