redis / redis

Redis is an in-memory database that persists on disk. The data model is key-value, but many different kind of values are supported: Strings, Lists, Sets, Sorted Sets, Hashes, Streams, HyperLogLogs, Bitmaps.
http://redis.io
Other
66.89k stars 23.79k forks source link

LOADING Redis is loading the dataset in memory #4624

Open Hronom opened 6 years ago

Hronom commented 6 years ago

This is a reference issue from Spring Jira https://jira.spring.io/browse/DATAREDIS-757

This problem not allows Redis to gracefully integrate in environment of microservices.

Several servers have the issue that they open a TCP port before they are actually ready. In contrast, Spring Boot opens a port as soon as it's ready with all initialization and I think that would be a better option for Redis, too, to open a port as soon as all data is loaded.

This helps other application to don't crash with exception:

LOADING Redis is loading the dataset in memory

Please open TCP port when Redis completely ready to serve.

Hronom commented 6 years ago

Also there is discussion on client lib Lettuce: https://github.com/lettuce-io/lettuce-core/issues/625 This is not a client responsibility handle this.

scenbuffalo commented 6 years ago

same problem as yours. if you use sentinel, you can have a try. https://github.com/antirez/redis/issues/4561

Hronom commented 6 years ago

I have created small workaround sh script that helps to wait until Redis fully started, here is the repo https://github.com/Hronom/wait-for-redis

This will help till developers fix that.

ghik commented 5 years ago

Another solution to this could be some blocking command which waits until Redis is ready to serve data.

filipecosta90 commented 3 years ago

@Hronom I believe we can close this issue given this is a by-design feature and not really an issue. requesting the @redis/core-team opinion about it.

Redis enables certain commands while loading the database (ok-loading) like INFO, SELECT, MONITOR, DEBUG,.....

Clients should be responsible for being able to handle LOADING replies ( like they are able to handle MOVED, ASKED, etc... ). WDYT?

Hronom commented 3 years ago

@filipecosta90 this is defiantly a feature request, the github issues here used as place where you not only place issues, but where users can place feature request.

So please consider this as feature request.

In the world of docker this is very needed feature and it 3 years old? 3 years not able to implement solution with open port after load or commands that waits?

oranagra commented 3 years ago

This is indeed by design and a feature request, let's see how painful it is and what we can do to improve.

Maybe we can add a config flag that will tell redis not to listen on the socket at startup until after it loaded the persistence file. This could take a very long time and will mean that no one can monitor it and get any progress checks, so many users would not want to use such a feature, but i suppose for some it can make life simpler. Note however that it won't solve the -LOADING response returned by a replica when loading the data it gets from the master.

Another approach could be to postpone most commands (excluding INFO, PING, etc) while loading instead of responding with -LOADING error. similar to what CLIENT PAUSE WRITE does today. This would be a breaking change, and maybe also needs to be controlled by a config flag. @redis/core-team WDYT?

yossigo commented 3 years ago

@oranagra I think this should be handled by the client, it's fairly easy to catch this and retry if that's what the app wishes to do.

collimarco commented 3 years ago

We use the Ruby Redis client and this error in not handled properly- i.e. it simply raise the exception:

LOADING Redis is loading the dataset in memory

It is a pain.

eduardobr commented 3 years ago

@yossigo It could be that the retry will take minutes to succeed depending on dataset size. Even 10 seconds can be unacceptable in some applications.

As I commented in a similar thread: https://github.com/redis/redis/issues/4561#issuecomment-889841284

"I've been thinking why can't we have a mode where the replica keeps serving stale data while the Full Sync is happening in background. That could cost twice the memory and maybe disk space usage for a short period, but I think it definitely worth it and would get rid the LOADING status in most situations. In my use case where there's no need for sharding (so no Redis Cluster) and where it's ok to have master down for about a minute and replicas serving stale data, it would be very useful. Sometimes a redis standalone master with a few replicas can be a solid setup, if we solve this kind of issues. That of course makes things more Kubernetes friendly without needing to pay for Redis Operator."

yossigo commented 3 years ago

@eduardobr You're referring to a specific case here:

In that case, using repl-diskless-load swapdb alread yields a very similar behavior, although AFAIR (need to check) it still returns -LOADING and refuses commands. But that's probably a relatively small change.

eduardobr commented 3 years ago

@yossigo I was coincidently testing the behavior of repl-diskless-load swapdb today in an attempt to solve the Loading status. It will return LOADING about the same amount of time as disk load in our setup, but taking a look into the code it seems to be doable to change so we keep serving read commands. That would be an amazing improvement and make our setup solid here.

What does it take to move on with this change, and how could I help?

Thanks

oranagra commented 3 years ago

In theory, if someone is already willing to get stale reads, and is using repl-diskless-load swapdb, i don't see any reason not to allow reads from that backup database. at least as long as modules aren't involved. However, supporting that can be adding some extra complications to the code which i'm not sure are worth it.

Also, truth be told, i don't really like the swapdb feature. I added it following Salvatore's request, in order to have a safer diskless loading, in case the master dies after the replica started loading the new data which it can't complete. But i think in many cases this mode is more dangerous since the system may not have enough memory to host both databases side by side, and the kernel's OOM killer may intervene.

eduardobr commented 3 years ago

@oranagra @yossigo I started a draft of a change to test the concept, it seems to be ok and solve the problem: https://github.com/redis/redis/compare/unstable...eduardobr:feature/draft-use-tempdb-swapdb

I started swapping the pointers of whole db array, but later noticed I could use the already tested mechanism of swap db and run it for each db index (will take care of things like maintaining selected db reference).

But there are small things I still don't understand and would need help if we want to move on with this, for example, why calling "touchAllWatchedKeysInDb" works for SWAPDB command but crashes some tests if I use it after sync is finished. And if actually I would need to call it at all (we don't do we when restoring the backup in current implementation).

Also, the failing test may not be relevant anymore, but I would need help with that.

oranagra commented 3 years ago

up until now, touchAllWatchedKeysInDb was only needed once when we begin the loading, since no commands could be executed during loading. but now:

  1. we no longer want to invalidate watched keys when loading begins (we logically, we didn't yet change anything in the db from the user's perspective)
  2. we do need to invalidate watched keys when loading was successful (but not when it failed if we recover from the backup).

regarding the crash and the test, i don't now what you're facing, you'll have to debug it. if you need assistance, maybe you can submit a PR and ask others expert contributors for help (i'm a bit too busy these days to dive in myself)

eduardobr commented 3 years ago

PR created: https://github.com/redis/redis/pull/9323

eduardobr commented 3 years ago

Revisiting this original issue, I think we can solve with little code and no breaking changes for scenarios with more than one replica: Simply offer an option to start full syncs with no more than n replicas at the same time. Then you ensure you won’t put down all replicas with LOADING state simultaneously.

repl-diskless-sync-delay 0 doesn’t seem to guarantee a single replica at a time.

We could have some repl-max-parallel-sync [n]

@yossigo @oranagra Makes sense?

madolson commented 3 years ago

@eduardobr I'm not sure how that solves the original request. Instead of having "-Loading" errors the clients will get stale data for a bit, then later get the loading error later.

eduardobr commented 3 years ago

@madolson it solves in the sense that proxies, container orchestrators (and maybe client libraries) can send the client to a replica that is not “-Loading”. In the case of Kubernetes, probes can query for this status to quickly take a replica out of service setting it unready without killing it. Because it’s optional, the consumer will know the consequences (having stale vs no data at all), kinda same as the the compromise of replica-serve-stale-data but on a different case.

When configured with 50% of total replicas, in worst case master will finish syncs in 2 rounds

madolson commented 3 years ago

@eduardobr I see, I think that is a minor improvement but I'm not sure it solves the original requester's issue, which is that they weren't able to handle the -Loading error. Presumably they could handle the timeout.

eduardobr commented 3 years ago

@madolson right, that is not precisely what the author wants, just an alternative to mitigate the problems that -Loading error brings. Original solution of having port closed when doing full sync could confuse other systems because they won't be able to know if redis is even alive.

fiedl commented 2 years ago

As a workaround, I'm using a docker-compose healthcheck:

# docker-compose.yml

services:
  app:
    depends_on:
      redis:
        condition: service_healthy
  redis:
    healthcheck:
      test: ["CMD-SHELL", "redis-cli ping | grep PONG"]
      interval: 1s
      timeout: 3s
      retries: 5
bradjones1 commented 2 years ago

^ This is the way. TIL that the compose spec now allows you to actually wait on other services with healthchecks. dockerize is still great if services open ports only when ready, but this is awesome, and the "Docker way to do it" IMO.

https://docs.docker.com/compose/compose-file/#depends_on

(Also, apparently version is now unnecessary, if you're wondering "what version" this was released on.)