tarantool / ddl

The DDL module enables you to describe data schema in a declarative YAML-based format.
BSD 2-Clause "Simplified" License
12 stars 6 forks source link

Caching of sharding function #82

Closed Totktonada closed 2 years ago

Totktonada commented 3 years ago

(There is more general task about caching #69, but I want to eat this pie piece by piece.)

We're going to add ddl.bucket_id() function (see #76). The function may be called quite frequently, so it worth to take care to its performance.

The ddl.bucket_id() function needs to know a sharding function. It is costly to obtain the function declaration / definition stored in the _ddl_sharding_func space, mainly due to those actions:

  1. MsgPack decoding.
  2. loadstring() is the function is declared as code ({body = <...>}).
  3. Extra Lua GC pressure on re-cretion of usually same objects.

Ideally obtaining of the function should be just Lua table lookup. And it is possible to achieve.

The only way to track _ddl_sharding_func changes is to set a trigger on the space to track modifications (on_replace)[^1]. Since it is not always possible to set a trigger when the module is just loaded[^2], I propose a trick.

The key idea is to generate an initial cache value and set the trigger when we access the sharding function information first time. After this the cache will be updated 'in background' (by the trigger) and we can just access the cache.

What to consider:

Optimization trick:

We can use the trick with two implementations of the cache access function (see src/box/lua/load_cfg.lua in tarantool for example). The first function doing all the work: check whether the trigger is set (and the initial cache is generated), set the trigger and generate the cache if necessary, access cache, replace itself with the second function. The second function skips extra checks and just access the cache.

I'll note that the first function must not set the trigger unconditionally, because it may be called after hot reload. See https://github.com/tarantool/tarantool/issues/5826.


Looks a bit tricky, but doable. Opinions?

[^1]: I filed https://github.com/tarantool/tarantool/issues/6544 and https://github.com/tarantool/tarantool/issues/6545 to track it using the database schema version in a future. I think it may simplify some future code. [^2]: box may be unconfigured (or not fully loaded), when the module is required first time.

ligurio commented 3 years ago

Alexander Turenko initiated a discussion in https://github.com/tarantool/ddl/pull/77 regarding to replace of pcall() with raw function call. Because we can see some performance drawbacks here. Need to check hypothesis.