richfitz / redux

:telephone_receiver::computer: Redis client for R
https://richfitz.github.io/redux
91 stars 17 forks source link

Failure communicating with the Redis server #58

Open dereckmezquita opened 10 months ago

dereckmezquita commented 10 months ago

Hello I am using redux/redis I keep running into this failure:

rlang$inform(str_interp('[${time_stamp}] Saving to redis...'))
tryCatch({
    private$save_redis()
}, error = function(e) {
    cat(str_interp('Caught in private$save_redis:\n${as.character(e)}'))
    traceback()
    private$redis$reconnect()
    private$redis$PING()
})
Caught in private$save_redis:
Error in redis_command(ptr, cmd): Failure communicating with the Redis server
No traceback available 

Notice how I tried to catch the error and reconnect, this does not work. I have to kill the process and manually restart it every time.

Where my private$save_redis function is:

save_redis = \() {
    monte_carlo <- redux$object_to_bin(private$monte_carlo$results)
    private$redis$SET("monte_carlo", monte_carlo)
    data <- redux$object_to_bin(private$data)
    private$redis$SET("data", data)
}

I am executing the write operation in a loop 1 times per second and reading from redis about every 300ms using the standard redis$GET("monte_carlo") function.

I keep getting that error over and over which blocks my programme, I have to kill it and start it again.

I thought this might have been an io problem and I tested this with a basic loop but I don't ever get the same error. I am hoping someone could lend some insight.

Here are the scripts I created to try to replicate the issue but I was not able to:

index.R

box::use(redux)
box::use(data.table)

# Error in redis_command(ptr, cmd): Failure communicating with the Redis server

redis <- redux$hiredis()

iris_dt <- data.table$data.table(iris)
iris_bn <- redux$object_to_bin(iris_dt)

tryCatch({
    for (i in 1:1000000) {
        cat("Iteration: ", i, "\n")
        test <- redis$SET("iris_test", iris_bn)
        print(test)
    }
}, error = function(e) {
    print(e)
    stop(e)
})

read.R (run in parallel in another thread)

box::use(redux)
box::use(data.table)

redis <- redux$hiredis()

tryCatch({
    for (i in 1:1000000) {
        cat("Iteration: ", i, "\n")
        test <- redis$GET("iris_test")
        test <- redis$GET("iris_test")
        test <- redis$GET("iris_test")
        test <- redis$GET("iris_test")
        test <- redis$GET("iris_test")
        test <- redis$GET("iris_test")
        test <- redis$GET("iris_test")
        test <- redis$GET("iris_test")
        # print(redux$bin_to_object(test))
    }
}, error = function(e) {
    print(e)
    stop(e)
})
richfitz commented 10 months ago

Here is a reprex showing how the reconnect method is meant to work. I've started a new redis server with a default configuration by running:

docker run --rm -d  -p 127.0.0.1:6379:6379 redis

and this is used by the script below.

One cause of a connection loss is the client timeout, though this indefinite by default.

con <- redux::hiredis()
con$CONFIG_GET("timeout") # timeout 0 -- indefinite by default
con$CONFIG_SET("timeout", 2) # set to 2 seconds

With a new connection, interact with the server with ever longer pauses; eventually this triggers the same error you see:

con <- redux::hiredis()
for (i in 1:10) {
  message(sprintf("sleeping for %d seconds", i))
  Sys.sleep(i)
  con$PING()
}

(typically this will happen on the third iteration, but according to the redis docs it is allowed to happen later).

Again, but with a handler like yours:

con <- redux::hiredis()
for (i in 1:10) {
  message(sprintf("sleeping for %d seconds", i))
  Sys.sleep(i)
  tryCatch(
    con$PING(),
    error = function(e) {
      msg <- conditionMessage(e)
      if (grepl("Failure communicating", msg, fixed = TRUE)) {
        ## TODO: I should really improve the error you get here to make
        ## this catchable with a class.
        message(sprintf("Trying to recover from redis failure: %s", msg))
        con$reconnect()
        con$PING()
        message("...recovered")
        return()
      }
      stop(e)
    })
}

You can see the failure, catch and recovery here. The final stop(e) is never triggered.

Restore previous default timeout

redux::hiredis()$CONFIG_SET("timeout", 0)

So something is different in your setup, and there's not much I can do to help unless I know what that is, I'm afraid. If your redis server is on another machine it's possible you have intermittent network failures; that would explain why your recovery fails. Depending on how you're looking after your redis server you might look in the logs and see if there's anything there to indicate what is going on. You can run redis-cli MONITOR and watch for traffic too, though I suspect nothing interesting will come of that.

It might be interesting to try and create an entirely new connection (redux$hiredis()) at the point where you are handling the failure - my suspicion is that that will fail also, indicating that the issue is with the redis server or connection.

If you can create a reprex that I can run and recreate the issue though please do. Ideally without box etc so that it's as easy to run as possible.