tarantool / tarantool-python

Python client library for Tarantool
https://www.tarantool.io
BSD 2-Clause "Simplified" License
100 stars 48 forks source link

Introduce ConnectionPool with master discovery #207

Closed DifferentialOrange closed 2 years ago

DifferentialOrange commented 2 years ago

python: drop Python 2 support

Python 2.7 reached the end of its life on January 1st, 2020 [1]. Since it would be a waste to ignore several Python 3.x features in master discovery implementation, we decided to drop Python 2 support here.

Python 2 workaround cleanup activities are expected to be solved as part of #212 solution.

  1. https://www.python.org/doc/sunset-python-2/

connection: introduce common interface

Introduce connection interface to be used in connection pool implementation. Only CRUD and base connect/close API is required by the interface.

Part of #196

connection_pool: introduce connection pool

Introduce ConnectionPool class to work with cluster of Tarantool instances. ConnectionPool support master discovery and ro/rw-based requests, so it is most useful while working with a single replicaset of instances. ConnectionPool is supported only for Python 3.7 or newer. Authenticated user must be able to call box.info on instances.

ConnectionPool updates information about each server state (RO/RW) on initial connect and then asynchronously in separate threads. Application retries must be written considering the asynchronous nature of cluster state refresh. User does not need to use any synchronization mechanisms in requests, it's all handled with ConnectionPool methods.

ConnectionPool API is the same as a plain Connection API. On each request, a connection is chosen to execute this request. Connection is selected based on request mode:

Example:

pool.call('some_write_procedure', arg, mode=tarantool.Mode.RW)

Closes #196

image

Mons commented 2 years ago

Never look at box.cfg.read_only. Look only at box.info.ro

DifferentialOrange commented 2 years ago

Never look at box.cfg.read_only. Look only at box.info.ro

It was used only in tests, reworked

DifferentialOrange commented 2 years ago

Alternative approaches

Synchronous on request (on errors)

Solution idea: refresh schema (rw/ro info, replication state) on connect, RO error or network error.

Pros

Cons

Synchronous on request (with timeout)

Solution idea: refresh schema (rw/ro info, replication state) before request if X milliseconds have passed since last refresh.

Pros

Cons

Asynchronous

Solution idea: refresh schema (rw/ro info, replication state) in separate thread each X milliseconds.

Pros

Cons

What solution should we choose?

Solutions may be hybrid (1+2 or 1+3), and I think it would be the best approach to cover more cases. Personally I prefer synchronous on request on error + timeout: the only drawback is increasing latency for some requests, but it's rather simple to implement compared to introducing async. It is much harder to combine synchronous refresh on error and async approach, and using pure async approach will pass responsibility to implement non-trivial retry logic to app developers.

Totktonada commented 2 years ago

All requests that are guaranteed to write (insert, replace, delete, upsert, update) use RW mode.

An RO instance can write to a replica local or a temporary space. Well, it is strange to write to a replica local space on some RO instance. However there may be a use case: say, register a task to be proceeded in background. I think we should have good defaults, but allow to choose anyway.