DifferentialOrange commented 2 years ago

python: drop Python 2 support

Python 2.7 reached the end of its life on January 1st, 2020 [1]. Since it would be a waste to ignore several Python 3.x features in master discovery implementation, we decided to drop Python 2 support here.

Python 2 workaround cleanup activities are expected to be solved as part of #212 solution.

https://www.python.org/doc/sunset-python-2/

connection: introduce common interface

Introduce connection interface to be used in connection pool implementation. Only CRUD and base connect/close API is required by the interface.

Part of #196

connection_pool: introduce connection pool

Introduce ConnectionPool class to work with cluster of Tarantool instances. ConnectionPool support master discovery and ro/rw-based requests, so it is most useful while working with a single replicaset of instances. ConnectionPool is supported only for Python 3.7 or newer. Authenticated user must be able to call box.info on instances.

ConnectionPool updates information about each server state (RO/RW) on initial connect and then asynchronously in separate threads. Application retries must be written considering the asynchronous nature of cluster state refresh. User does not need to use any synchronization mechanisms in requests, it's all handled with ConnectionPool methods.

ConnectionPool API is the same as a plain Connection API. On each request, a connection is chosen to execute this request. Connection is selected based on request mode:

Mode.ANY chooses any instance.
Mode.RW chooses an RW instance.
Mode.RO chooses an RO instance.
Mode.PREFER_RW chooses an RW instance, if possible, RO instance otherwise.
Mode.PREFER_RO chooses an RO instance, if possible, RW instance otherwise. All requests that are guaranteed to write (insert, replace, delete, upsert, update) use RW mode by default. select uses ANY by default. You can set the mode explicitly. call, eval, execute and ping requests require to set the mode explicitly.

Example:

pool.call('some_write_procedure', arg, mode=tarantool.Mode.RW)

Closes #196

Mons commented 2 years ago

Never look at box.cfg.read_only. Look only at box.info.ro

DifferentialOrange commented 2 years ago

Never look at box.cfg.read_only. Look only at box.info.ro

It was used only in tests, reworked

DifferentialOrange commented 2 years ago

Alternative approaches

Synchronous on request (on errors)

Solution idea: refresh schema (rw/ro info, replication state) on connect, RO error or network error.

Pros

Since connector is synchronous itself, no new complicated mechanisms will be introduced.
Retries due to schema change are easy to implement and effective.

Cons

If no errors (for example, if we use only selects), schema will never be refreshed.
Reloads block requests, thus latency for some of them will be long.

Synchronous on request (with timeout)

Solution idea: refresh schema (rw/ro info, replication state) before request if X milliseconds have passed since last refresh.

Pros

Since connector is synchronous itself, no new complicated mechanisms will be introduced.
Schema will be refreshed even if no errors on requests (for example, if we use only selects).

Cons

No native retries on ro/rw or network errors. If we send RW request to RO instance, request must be retried for a full timeout before it succeeds.
Reloads block requests, thus latency for some of them (one request each X milliseconds) will be long.

Asynchronous

Solution idea: refresh schema (rw/ro info, replication state) in separate thread each X milliseconds.

Pros

Reloads are non-blocking for requests.
Schema will be refreshed even if no errors on requests (for example, if we use only selects).

Cons

Since connector is synchronous itself, we will need to introduce async mechanisms. Schema refresh will require x2 connections (otherwise synchronization primitives for x1 connections, which leads to requests blocking).
No native retries on ro/rw or network errors. If we send RW request to RO instance, request must be retried for a full timeout before it succeeds.

What solution should we choose?

Solutions may be hybrid (1+2 or 1+3), and I think it would be the best approach to cover more cases. Personally I prefer synchronous on request on error + timeout: the only drawback is increasing latency for some requests, but it's rather simple to implement compared to introducing async. It is much harder to combine synchronous refresh on error and async approach, and using pure async approach will pass responsibility to implement non-trivial retry logic to app developers.

Totktonada commented 2 years ago

All requests that are guaranteed to write (insert, replace, delete, upsert, update) use RW mode.

An RO instance can write to a replica local or a temporary space. Well, it is strange to write to a replica local space on some RO instance. However there may be a use case: say, register a task to be proceeded in background. I think we should have good defaults, but allow to choose anyway.

tarantool / tarantool-python

Introduce ConnectionPool with master discovery #207

python: drop Python 2 support

connection: introduce common interface

connection_pool: introduce connection pool

Alternative approaches

Synchronous on request (on errors)

Pros

Cons

Synchronous on request (with timeout)

Pros

Cons

Asynchronous

Pros

Cons

What solution should we choose?