Closed steve-lorimer closed 2 years ago
Hmm, actually the retry-loop didn't help... it still can't connect even after trying multiple times in a loop.
What's really weird is that the binary, when running on a different server, is able to connect fine; and redis-cli
is able to connect from the problematic server... something to do with the combination of the server and the binary using redis-plus-plus
is causing issues.
Hi @skebanga,
What is the recommended way for obtaining a Subscriber please?
Please see this post: https://github.com/sewenew/redis-plus-plus/issues/307#issuecomment-1002632214 Also read the chapter on Publish/Subscribe in the top level README.md and refer to the example in that chapter.
something to do with the combination of the server and the binary using
redis-plus-plus
is causing issues.
Try to see if the same libs are being used on all your servers. Run the following command on all your servers and compare the results carefully:
$ ldd your_binary
In order to be absolutely sure that your libs and binaries are all the same on all servers, you could compare their checksums:
$ ldd your_binary |awk -F'(' '{print $1}' |awk -F'=>' '{print $2}' |grep -v '^\s*$' |xargs sha256sum
$ sha256sum your_binary
You probably checked the connection (to make sure you don't have a network issue or a firewall problem) with redis-cli
.
If not, please try the following from all servers as well:
$ redis-cli -h <redis_server> -p <redis_port> ping
Regards
Thanks for the comment!
I am statically linking the redis libs, so that's not an issue
# ldd test_redis
linux-vdso.so.1 (0x00007ffeb956a000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f2e50cf5000)
libgcc_s.so.1 => /usr/local/gcc-11.2.0/lib64/libgcc_s.so.1 (0x00007f2e50cda000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f2e50cb7000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f2e50ac5000)
/lib64/ld-linux-x86-64.so.2 (0x00007f2e50e4e000)
I have also managed to boil the issue down to the following, which shows that it is failing on issuing the AUTH
command to redis:
#include <sw/redis++/redis++.h>
#include <iostream>
#include <chrono>
int main()
{
sw::redis::ConnectionOptions opts("tcp://10.10.6.106:6379");
opts.user = "default";
opts.password = "XXXXXXXX";
opts.socket_timeout = std::chrono::milliseconds(100);
sw::redis::Redis r{opts};
sw::redis::Subscriber sub{r.subscriber()};
return 0;
}
It is working with redis_cli
, but not with the above test app...
running both through strace
I can see the following:
redis_cli:
socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
fcntl(3, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(6379), sin_addr=inet_addr("10.10.6.106")}, 16) = -1 EINPROGRESS (Operation now in progress)
poll([{fd=3, events=POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
fcntl(3, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl(3, F_SETFL, O_RDWR) = 0
setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(3, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
setsockopt(3, SOL_TCP, TCP_KEEPIDLE, [15], 4) = 0
setsockopt(3, SOL_TCP, TCP_KEEPINTVL, [5], 4) = 0
setsockopt(3, SOL_TCP, TCP_KEEPCNT, [3], 4) = 0
write(3, "*2\r\n$4\r\nAUTH\r\n$50\r\n4XXXXXXXX"..., 71) = 71
read(3, "+OK\r\n", 16384) = 5
test_redis:
socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
fcntl(3, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(6379), sin_addr=inet_addr("10.10.6.106")}, 16) = -1 EINPROGRESS (Operation now in progress)
poll([{fd=3, events=POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
connect(3, {sa_family=AF_INET, sin_port=htons(6379), sin_addr=inet_addr("10.10.6.106")}, 16) = 0
setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0
fcntl(3, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl(3, F_SETFL, O_RDWR) = 0
setsockopt(3, SOL_SOCKET, SO_RCVTIMEO_OLD, "\0\0\0\0\0\0\0\0\240\206\1\0\0\0\0\0", 16) = 0
setsockopt(3, SOL_SOCKET, SO_SNDTIMEO_OLD, "\0\0\0\0\0\0\0\0\240\206\1\0\0\0\0\0", 16) = 0
sendto(3, "*2\r\n$4\r\nAUTH\r\n$50\r\n4XXXXXXXX"..., 71, 0, NULL, 0) = 71
recvfrom(3, 0x7ffe9ddbbbd0, 16384, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
So major differences I can see are:
redis-cli
only issues a single connect
call, whereas redis-plus-plus
calls connect
twice (presumably because the first returns EINPROGRESS
?)redis-cli
sets some additional socket options which redis-plus-plus
doesn'tredis-cli
uses write
/read
on the socker, while redis-plus-plus
uses sendto
/recvfrom
I'm a bit lost at this point tbh! :(
I ran my test program on a server which works, and got the following strace output:
socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
fcntl(3, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(6379), sin_addr=inet_addr("10.10.6.106")}, 16) = -1 EINPROGRESS (Operation now in progress)
poll([{fd=3, events=POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
connect(3, {sa_family=AF_INET, sin_port=htons(6379), sin_addr=inet_addr("10.10.6.106")}, 16) = 0
setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0
fcntl(3, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl(3, F_SETFL, O_RDWR) = 0
setsockopt(3, SOL_SOCKET, SO_RCVTIMEO_OLD, "\0\0\0\0\0\0\0\0\240\206\1\0\0\0\0\0", 16) = 0
setsockopt(3, SOL_SOCKET, SO_SNDTIMEO_OLD, "\0\0\0\0\0\0\0\0\240\206\1\0\0\0\0\0", 16) = 0
sendto(3, "*2\r\n$4\r\nAUTH\r\n$50\r\n4Vg2dTZVyuh2b"..., 71, 0, NULL, 0) = 71
recvfrom(3, "+OK\r\n", 16384, 0, NULL, NULL) = 5
So literally the only difference between the two is that the failing server's recvfrom
is:
recvfrom(3, 0x7ffe9ca7aa40, 16384, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
whereas the working server's recvfrom
is:
recvfrom(3, "+OK\r\n", 16384, 0, NULL, NULL) = 5
Well - some progress it seems!
The culprit is opts.socket_timeout
, and increasing it to a higher value means the recvfrom
call succeeds.
I'll admit I'm no redis expert... but would it be fair to say it seems a bit weird that 100ms is too low?
@skebanga Yes, I think the problem is that socket_timeout
is too small for your redis server.
redis-cli by default might not set a timeout, so it does not have the problem. If you set socket_timeout
to 0 when using redis-plus-plus, it should work fine too.
You’d better check if your redis server in a healthy status, or the network between your application and redis is not fast enough.
Regards
@sewenew thanks for the comment! The only issue with setting a socket_timeout
to 0 is that when I enter the consume
loop it will never return, and so then I can never check my std::stop_token
and break the loop when I want the thread to stop.
while (!token.stop_requested())
{
try
{
sub.consume(); // without socket_timeout this call never returns if there are no published messages
}
catch (const sw::redis::TimeoutError&)
{
// by using socket_timeout we can catch the TimeoutError and check the std::stop_token
}
catch (const sw::redis::Error& e)
{
LOG_ERROR("redis error: ", e.what());
break;
}
}
Ideally there would be a thread-safe way to cause sub.consume()
to return, thereby allowing the thread to exit
It looks like same issue I have posted several days ago here (https://github.com/sewenew/redis-plus-plus/issues/323). For now we have a workaround: just publish a fake message to one of the subscribed channels to rotate the loop.
@sewenew it appears some general solution to this will be highly appreciated. 👍
Best regards
just publish a fake message to one of the subscribed channels to rotate the loop.
:sob:
@skebanga
The only issue with setting a socket_timeout to 0 is that when I enter the consume loop it will never return, and so then I can never check my std::stop_token and break the loop when I want the thread to stop.
You don't need to set it to 0. Your problem is that your network latency is too big, and if the socket_timeout is smaller than it, it cannot do authentication and subscription before consuming messages.
In order to make it work you need to set socket_timeout larger than the network latency.
Ideally there would be a thread-safe way to cause sub.consume() to return, thereby allowing the thread to exit
As @Mephalich mentioned you can send a termination message to the channel. Once consume
is called, it blocks until there's a message or timeout reaches.
Regards
@sewenew thanks for the comments! Yes, understand the need to set timeout higher than network latency in this case...
However, I think in an ideal world it would be nice to be able to set zero timeout, as in I don't care how long the authentication and subscription takes, just so long as it completes... but also be able to effect a break out of the consume
loop when I want to shut down.
Forcing the user to have to set a timeout just so they can exit the loop is awkward...
Anyway, I'll close the issue, because my question has been answered (even if I don't like the answer!) :)
Thanks for a great package!
Describe the problem
Attempting to obtain a
Subscriber
from a Redis connection fails with a TimeoutError exception being thrown, with the description being "Resource temporarily unavailable" (ie: errnoEAGAIN
)Stacktrace shows this is from
Connection::recv
as a result ofConnection::_auth
during call toRedis::subscriber
What is the correct way to handle this?
At the moment I'm just attempting to create a subscriber in a loop, and if a
TimeoutError
exception is thrown, I just try again, up to some arbitrarily picked max number of retries... it seems to work... but it also seems inelegant / a bit of a hack.What is the recommended way for obtaining a Subscriber please?
Environment:
ubuntu
gcc-11.2.0
1.0.0
1.3.3