tarantool / vshard

The new generation of sharding based on virtual buckets
Other
100 stars 30 forks source link

Implement correct handling for recoverable errors and reliable function execution on storages #281

Open akudiyar opened 3 years ago

akudiyar commented 3 years ago

Related to https://github.com/tarantool/cartridge-java/issues/89

When a function is executed with any of the rpc calls (callro, callrw, etc), a network error may occur caused by some problem with storage node (node is restarted, node became read-only, node is overloaded etc).

Currently such errors are propagated to client and there is no unified way of handling these error and making a reliable call (e.g. with retries and fallback to another replica if it is a RO call). Also these errors are of netbox or internal CURL source and do not share the same format to be distinguished from other (logical, unrecoverable) errors.

Proposal: 1) Implement wrapping of the recoverable errors in a unified format that allows the client to implement different logic for recoverable and unrecoverable (other errors). Will imply classification of the internal errors. 2) Implement API for specifying the retry policies for the recoverable errors, which settings may be passed in the call* functions options, for example.