tarantool / vshard

The new generation of sharding based on virtual buckets
Other
100 stars 30 forks source link

Provide routing metadata via API #308

Open akudiyar opened 2 years ago

akudiyar commented 2 years ago

Motivation

Currently, vshard routers act as entry points for all computation performed on the vshard cluster, they maintain the routing information (via the pool of storage connections and collected bucket information) and provide map-reduce operations. It is good enough for using the routers in the Lua applications, where vshard acts as a library. But if the cluster is deployed as an application itself (a distributed storage), with no access for the developers to adding of the API functions on the Tarantool nodes, it becomes a problem at least in the following cases: 1) Complex data structure, when the business objects are stored in several spaces linked with "foreign keys" (some identifiers) and need to be aggregated locally on the storage on the map-reduce operations 2) Bulk loading of large data sets, which leads to routers CPU overload and overall poor performance, that may be mitigated using direct operations with storage nodes (assuming that the cluster topology does not change during this operation).

For implementing such business logic for complex querying or loading objects in the user applications written in languages other than Lua (e.g. Java), is important to have an API for retrieving the cluster metadata, for example:

There is a function router.routeall, which may be changed for returning the necessary additional metadata, or a new function may be implemented.

  1. Add an API functions to the vshard router, returning particular replicaset metadata by bucket ID, like:
    vshard.router.replicaset(bucket_id)

There is a function router.route, which may be changed for returning the necessary additional metadata, or a new function may be implemented.

  1. Add an API entry point for subscribing to the replicaset state changes, like:
    vshard.router.subscription.replicaset_state_subscribe(replicaset_uuid, binary_connection)

The subscriber will then receive updates when a replicaset leadership changes or the replicaset becomes balanced / unbalanced / receiving buckets / deleting buckets etc.

The subscription may be deleted automatically when the passed binary connection closes, or via a mirror function like:

vshard.router.subscription.replicaset_state_unsubscribe(replicaset_uuid, binary_connection)

Since replicaset state changes aren't very frequent, holding subscription connections shouldn't noticeably impact the performance.

  1. Add an API entry point for subscribing to the bucket state changes, like:
    vshard.router.subscription.bucket_state_subscribe(bucket_id, binary_connection)

The subscriber will then receive updates when a bucket is moved / started moving / finished moving.

The subscription may be deleted automatically when the passed binary connection closes, or via a mirror function like:

vshard.router.subscription.bucket_state_unsubscribe(bucket_id, binary_connection)

Bucket changes are also not frequent in the production environment, and in the event of planned cluster scaling the subscriptions may be temporarily disabled or closed.

Related issues

https://github.com/tarantool/cartridge-spark/issues/16

R-omk commented 2 years ago

Its same as https://github.com/tarantool/vshard/issues/209 and mostly discussed here https://github.com/tarantool/vshard/discussions/280

Globally my idea is that it is a separate entity with its own good streaming api, the router should not take on more responsibilities than is required to fulfill its main purpose.

akudiyar commented 2 years ago

Regarding subscriptions, there's a useful feature implemented recently: subscription protocol extension -- https://github.com/tarantool/tarantool/issues/6257

The internal polling algorithms for replicaset leadership or bucket changes may be rewritten using this feature.

Also, as discussed with Vlad, it may be worth creating a separate branch for implementing new vshard features on top of the newest 2.x features, without supporting 1.10.