opensearch-project / opensearch-py

Python Client for OpenSearch
https://opensearch.org/docs/latest/clients/python/
Apache License 2.0
357 stars 176 forks source link

[BUG] Underlying aiohttp doesn't have TCP connection keep alive #465

Open knoha opened 1 year ago

knoha commented 1 year ago

What is the bug?

An instance of TCPConnector created here, does not recreate connections, when the initial set of connections are used (utilized). When proto object is closed here (and it will be closed after default 15 keepalive seconds), recreated connection is never placed into the pool, so for each subsequent requests we need to establish a new TCP connection which strike performance really bad (in my case additional ~100-200ms on each request). So even if you have a fast query this TCP thing is just ruining your performance.

How can one reproduce the bug?

What is the expected behavior?

When a new TCP connection is created is has to be placed in the pool and utilized by subsequent calls until expires.

What is your host/environment?

Python 3.11.3 aiohttp==3.8.5 opensearch-py==2.3.0

Do you have any screenshots?

  1. These are just print statements I've added for debuging the flow. Initially proto 1 and proto 2 are same, used from the pool.

    state1-connection-used-from-pool
  2. Later, after default initial 15 seconds, proto 1 will be always None (empty TCP pool), and each time new connection will be created.

    state2-new-connection-created-never-placed-to-pool-2

UPD: proto1 (before) and proto2 (after) logs - these are lines before and after getting a connection here.

I know this and the issue on the underlying library, but maybe it should be replaced or some other fix is done?

dblock commented 1 year ago

🤔 any idea how we can fix this?

knoha commented 1 year ago

This could be an interesting issue. Seems like it is not happening under a constant load (on prod env for example). But when there are some short breaks in the load (like in local env) - I can observe that the TCP connection each time is created from scratch and not taken from a pool.

Would be good if someone else also could spend some time checking this issue. It could be that I don't fully understand how it behaves under the hood.

dblock commented 1 year ago

So there's a timing that's based on a timeout? I would start by writing a repro (unit test or a benchmark) that causes a new new TCP connection to be established, to start in a visible/measurable way.

vaibhav-orangehealth commented 2 months ago

facing same issue I tried with aiohttp with session persistence but ssl handshake happen every 15-20 sec if connection is idle