Open Hronom opened 6 years ago
Also there is discussion on client lib Lettuce
: https://github.com/lettuce-io/lettuce-core/issues/625
This is not a client responsibility handle this.
same problem as yours. if you use sentinel, you can have a try. https://github.com/antirez/redis/issues/4561
I have created small workaround sh script that helps to wait until Redis fully started, here is the repo https://github.com/Hronom/wait-for-redis
This will help till developers fix that.
Another solution to this could be some blocking command which waits until Redis is ready to serve data.
@Hronom I believe we can close this issue given this is a by-design feature and not really an issue. requesting the @redis/core-team opinion about it.
Redis enables certain commands while loading the database (ok-loading
) like INFO
, SELECT
, MONITOR
, DEBUG
,.....
Clients should be responsible for being able to handle LOADING
replies ( like they are able to handle MOVED
, ASKED
, etc... ).
WDYT?
@filipecosta90 this is defiantly a feature request, the github issues
here used as place where you not only place issues, but where users can place feature request.
So please consider this as feature request.
In the world of docker this is very needed feature and it 3 years old? 3 years not able to implement solution with open port after load or commands that waits?
This is indeed by design and a feature request, let's see how painful it is and what we can do to improve.
Maybe we can add a config flag that will tell redis not to listen on the socket at startup until after it loaded the persistence file.
This could take a very long time and will mean that no one can monitor it and get any progress checks, so many users would not want to use such a feature, but i suppose for some it can make life simpler.
Note however that it won't solve the -LOADING
response returned by a replica when loading the data it gets from the master.
Another approach could be to postpone most commands (excluding INFO, PING, etc) while loading instead of responding with -LOADING
error. similar to what CLIENT PAUSE WRITE does today.
This would be a breaking change, and maybe also needs to be controlled by a config flag.
@redis/core-team WDYT?
@oranagra I think this should be handled by the client, it's fairly easy to catch this and retry if that's what the app wishes to do.
We use the Ruby Redis client and this error in not handled properly- i.e. it simply raise the exception:
LOADING Redis is loading the dataset in memory
It is a pain.
@yossigo It could be that the retry will take minutes to succeed depending on dataset size. Even 10 seconds can be unacceptable in some applications.
As I commented in a similar thread: https://github.com/redis/redis/issues/4561#issuecomment-889841284
"I've been thinking why can't we have a mode where the replica keeps serving stale data while the Full Sync is happening in background. That could cost twice the memory and maybe disk space usage for a short period, but I think it definitely worth it and would get rid the LOADING status in most situations. In my use case where there's no need for sharding (so no Redis Cluster) and where it's ok to have master down for about a minute and replicas serving stale data, it would be very useful. Sometimes a redis standalone master with a few replicas can be a solid setup, if we solve this kind of issues. That of course makes things more Kubernetes friendly without needing to pay for Redis Operator."
@eduardobr You're referring to a specific case here:
In that case, using repl-diskless-load swapdb
alread yields a very similar behavior, although AFAIR (need to check) it still returns -LOADING
and refuses commands. But that's probably a relatively small change.
@yossigo I was coincidently testing the behavior of repl-diskless-load swapdb
today in an attempt to solve the Loading status.
It will return LOADING
about the same amount of time as disk load in our setup, but taking a look into the code it seems to be doable to change so we keep serving read commands. That would be an amazing improvement and make our setup solid here.
What does it take to move on with this change, and how could I help?
Thanks
In theory, if someone is already willing to get stale reads, and is using repl-diskless-load swapdb
, i don't see any reason not to allow reads from that backup database. at least as long as modules aren't involved.
However, supporting that can be adding some extra complications to the code which i'm not sure are worth it.
Also, truth be told, i don't really like the swapdb
feature.
I added it following Salvatore's request, in order to have a safer diskless loading, in case the master dies after the replica started loading the new data which it can't complete.
But i think in many cases this mode is more dangerous since the system may not have enough memory to host both databases side by side, and the kernel's OOM killer may intervene.
@oranagra @yossigo I started a draft of a change to test the concept, it seems to be ok and solve the problem: https://github.com/redis/redis/compare/unstable...eduardobr:feature/draft-use-tempdb-swapdb
I started swapping the pointers of whole db array, but later noticed I could use the already tested mechanism of swap db and run it for each db index (will take care of things like maintaining selected db reference).
But there are small things I still don't understand and would need help if we want to move on with this, for example, why calling "touchAllWatchedKeysInDb" works for SWAPDB command but crashes some tests if I use it after sync is finished. And if actually I would need to call it at all (we don't do we when restoring the backup in current implementation).
Also, the failing test may not be relevant anymore, but I would need help with that.
up until now, touchAllWatchedKeysInDb was only needed once when we begin the loading, since no commands could be executed during loading. but now:
regarding the crash and the test, i don't now what you're facing, you'll have to debug it. if you need assistance, maybe you can submit a PR and ask others expert contributors for help (i'm a bit too busy these days to dive in myself)
PR created: https://github.com/redis/redis/pull/9323
Revisiting this original issue, I think we can solve with little code and no breaking changes for scenarios with more than one replica: Simply offer an option to start full syncs with no more than n replicas at the same time. Then you ensure you won’t put down all replicas with LOADING state simultaneously.
repl-diskless-sync-delay 0
doesn’t seem to guarantee a single replica at a time.
We could have some
repl-max-parallel-sync [n]
@yossigo @oranagra Makes sense?
@eduardobr I'm not sure how that solves the original request. Instead of having "-Loading" errors the clients will get stale data for a bit, then later get the loading error later.
@madolson it solves in the sense that proxies, container orchestrators (and maybe client libraries) can send the client to a replica that is not “-Loading”. In the case of Kubernetes, probes can query for this status to quickly take a replica out of service setting it unready without killing it. Because it’s optional, the consumer will know the consequences (having stale vs no data at all), kinda same as the the compromise of replica-serve-stale-data but on a different case.
When configured with 50% of total replicas, in worst case master will finish syncs in 2 rounds
@eduardobr I see, I think that is a minor improvement but I'm not sure it solves the original requester's issue, which is that they weren't able to handle the -Loading error. Presumably they could handle the timeout.
@madolson right, that is not precisely what the author wants, just an alternative to mitigate the problems that -Loading error brings. Original solution of having port closed when doing full sync could confuse other systems because they won't be able to know if redis is even alive.
As a workaround, I'm using a docker-compose healthcheck:
# docker-compose.yml
services:
app:
depends_on:
redis:
condition: service_healthy
redis:
healthcheck:
test: ["CMD-SHELL", "redis-cli ping | grep PONG"]
interval: 1s
timeout: 3s
retries: 5
^ This is the way. TIL that the compose spec now allows you to actually wait on other services with healthchecks. dockerize
is still great if services open ports only when ready, but this is awesome, and the "Docker way to do it" IMO.
https://docs.docker.com/compose/compose-file/#depends_on
(Also, apparently version
is now unnecessary, if you're wondering "what version" this was released on.)
This is a reference issue from Spring Jira https://jira.spring.io/browse/DATAREDIS-757
This problem not allows Redis to gracefully integrate in environment of microservices.
Several servers have the issue that they open a TCP port before they are actually ready. In contrast, Spring Boot opens a port as soon as it's ready with all initialization and I think that would be a better option for Redis, too, to open a port as soon as all data is loaded.
This helps other application to don't crash with exception:
Please open TCP port when Redis completely ready to serve.