tarantool / vshard

The new generation of sharding based on virtual buckets
Other
98 stars 29 forks source link

[RFC] Dictionaries support #172

Open vasiliy-t opened 5 years ago

vasiliy-t commented 5 years ago

Motivating example

Application dealign with User's, User's have Attribute's, each Attribute has type, and there is a dictionary table (~10k records) linking attribute type to attribute name localization table and attribute groups table. Users stored on vshard storages and accessed through vshard routers.
There is an REST API /profile endpoint returning user profile info, including grouped attributes with localized names.

Problem statement

This example illustrates a use case for dictionary. There are not much records in dictionaries, dictionaries required to process almost any request, dictionary data is defined by user, not configuration, at some point dictionary data changes rarely.

There are several ways how to deal with dictionaries in sharded cluster:

Seem like to store a copy of dictionary on each storage is the best option but requires additional application logic - when new instance is set up dictionaries must be there before node starts processing requests, dictionaries updates must be processed consistently on each instance.

This seems like pretty common use case and it seems reasonable to implement dictionaries support directly in vshard.

Gerold103 commented 5 years ago

At first, the text is too big, full of your application details and hard to understand. Please, rephrase what you want in a more common terms. At second, I am sure that such 'lua-sharding' is not a common thing that can not be implemented on current vshard as an application. Vshard shard buckets consisting of tuples from spaces, not application nor language-specific in-memory data.

Gerold103 commented 5 years ago

Just for record - I would have understood an idea to shard additionally any user data, but I had not understood the text in the first comment and why it should shard only dictionaries. My proposal would look like this: I provide to a user an interface, a set of hooks, which vshard calls when tries to reshard. A user should implement this interface so as to return an iterator from which vshard fetches data and transfers it. On a destination storage another user hook is called which applies the data. It would allow to do not depend on type of data. An example of interface to register your iterators.

--
-- Register a custom sharded storage. Can be different from space.
-- @a storage is an object having methods:
--
-- * storage.iterator(bucket_id)
-- Get an iterator object for a specified bucket and having
-- method next(), returning a next object in this bucket of
-- this storage.
--
-- * storage.store(bucket_id, object)
-- Store an object, transferred from a remote storage.
--
-- * storage.gc(bucket_id)
-- Remove content of a specified bucket.
--
function vshard.storage.register_custom(name, storage)
-- ...
end
Gerold103 commented 5 years ago

After a verbal discussion it appeared, that 'dictionary table' here is a space, which should be fully stored on each instance in the cluster. In fact, this is a feature request for https://github.com/tarantool/tarantool/issues/3982. In case of urgency this issue can be solved without the core support via a special cluster-wide bucket.