ledgetech / lua-resty-redis-connector

Connection utilities for lua-resty-redis
234 stars 71 forks source link

auth/select of connection reuse issue #41

Open snpcp opened 3 years ago

snpcp commented 3 years ago

Hello,

I have an auth/select of connection reuse issue.

This my env: redis-server: 127.0.0.1:6379 auth: 123456

when I connect to redis-server. every connection method all to auth, but the connection may be a pool connection, and it needn't to auth.

I can create a single redis-connector for my code, however, the openresty not allowed to do that, please see: https://github.com/openresty/lua-resty-redis/issues/44

BTW, Could we do something on get_reused_times() of lua-resty-redis. If the return times > 0, then the connection is reused, so we ignore auth and select?

Please tell me how to working it. Thanks!

snpcp commented 3 years ago

This is example code: local count, err = r:get_reused_times(); if count == 0 then local password = host.password if password and password ~= "" then local res, err = r:auth(password) if err then return res, err end end end

pintsized commented 3 years ago

I'm not sure I understand your problem. You want to avoid sending auth and select over connections if they are being reused? Persistent connections are only there to avoid the cost of TCP connection overhead - they should not be stateful, which is what you seem to want?

So basically no, that's not a good idea. If you want to reuse a redis connection in various places, you are free to do this within the bounds of a single request. You cannot share connections across requests in a safe way.

snpcp commented 3 years ago

Sorry, let me restate my issue, I think the reused pool connection should not be execute to auth and select, we only execute it on the first time created pool connection(https://github.com/openresty/lua-resty-redis#get_reused_times), because the reused pool connection has already execute auth and select, If we re-execute auth and select every time we get connection, this will affect performance.

I have a scene. 10k requests, execute to GET of redis for every request, my expected tps is 10K/s, but in fact, the tps is 6K/s, the issue of scene is cause by each request is executed with auth and select, this takes up a lot of time.

How do you think? Thankes.

pintsized commented 3 years ago

Well, firstly, have you measured how much time auth and select really take? You get your desired 10k req/s on DB 0 with no password?

I think connections obtained from the keepalive pool need to behave just as if they are fresh connections. Like I said, they are intended as reused TCP connections, not stateful clients, since the keepalive mechanism works at a lower level and is shared across all TCP connections on the worker. However, your question raises an important concern because you're quite right, that's not how it works right now. If we send auth and then place the connection on the pool, when it is reused, it would not need to auth again.

Whilst I appreciate this is exactly what you're asking for, it's actually pretty bad I think. I think set_keepalive should reset the connection as much as possible. This is even more important having just merged Redis ACL support (#40), since it's not just a single master password anymore. Redis 6.2 adds a RESET command for this purpose; I guess they anticipated this problem when adding ACLs.

select is a little less harmful, but it's still weird that if using keepalives you potentially default to a DB other than 0. That's not documented at all.

One approach I considered was to use the keepalive_pool name. If we generate it (instead of leaving it to the user), we could at least partition the pools by differently authenticated clients on different databases. The problem with this however, is that the keepalive pool name must be set at connection time, and during that time the connection is free to select other databases or even re-authenticate.

So right now I'm thinking set_keepalive should call RESET if available, and if not select 0 at a minimum. And then we either default keepalive_pool to include a hash of the password if present, or at least encourage users to do that. As far as I can tell, you can't un-AUTH prior to 6.2, but that's when ACLs were added.

@hamishforbes @piotrp any thoughts on this?

@snpcp Sorry, I realise this isn't what you were hoping for! If you really have a measurable performance problem with auth and select and you wish to use keepalives to get around this, I would suggest rolling your own connection management because in your case you can obviously choose to do so safely.

piotrp commented 3 years ago

Note that SELECT is called already for all connections with connection_is_proxied = false as there is a default of db = 0, so this is already covered.

I'm not sure doing RESET in set_keepalive is a good idea - it mimics socket's API closely, which currently is I/O-free (unless it's a proxied connection). Maybe RESET should be called conditionally in connect_to_host, after checking result of get_reused_times? Performance impact could be negligible if RESET+AUTH+SELECT flow would be enclosed in command pipeline, as this would incur only one server round-trip (currently we do at least one, due to SELECT).

snpcp commented 3 years ago

@pintsized In a continuous pressure test of 200 users, An auth will take 3ms-70ms, and select will take 3ms-20ms. this additional net-IO overhead is intolerable to the program.

I agree with your not stateful clients, so I overwrite the connect method in my code.

And I found an other issue, In my code, I haven't using proxied and transactions for redis, but it would be must call the discard when the set_keepalive, in the fact, it will take 2ms-10ms.

I have an other changed, when the get master from sentinel, I using cache of master, the get_master only worker in the first time, and retry to get_master when the connect error. A crucial premise is that the primary node does not change very often, this will economize 2ms-100ms of IO. This may not be a good idea, But that's the only thing I can do.

Thanks for your help.