twitter / finagle

A fault tolerant, protocol-agnostic RPC system
https://twitter.github.io/finagle
Apache License 2.0
8.78k stars 1.45k forks source link

finagle-redis: QUEUED responses fail with ClassCastException #611

Open chrisbenincasa opened 7 years ago

chrisbenincasa commented 7 years ago

When Redis returns a QUEUED response, the returned value is opaquely cast which causes runtime errors.

Expected behavior

I'm not sure what the right call here is. On one hand, the call to Redis was successful, so failing the Future doesn't seem right. On the other, returning any instance of T also seems wrong and changing everything to Option[T] would be an even larger change. One option is to wrap all results in some "Response" class to provide additional information about the result.

Actual behavior

On QUEUED from Redis, the result is cast to a Future[Nothing] (presumably to get the types working) from Future.done, which ends up being a ConstFuture(Return(Unit)). Callers see the Future as successful, however do not know the underlying value is a BoxedUnit, which leads to ClassCastExceptions when the value is passed to functions.

Example error:

java.lang.ClassCastException: scala.runtime.BoxedUnit cannot be cast to scala.collection.Seq
        at com.curalate.redis.client.RedisClientBase$$anonfun$mGet$1.apply(RedisClientBase.scala:73)

Steps to reproduce the behavior

  1. Issue a Redis transaction but do not block on the resultant Future nor attach callbacks
  2. Immediately make another Redis call and attach a callback that operates on the value
  3. More often than not, a ClassCastException or NoSuchMethodError will be thrown. One may have to retry multiple times, as this is a race condition (i.e. one request will reach Redis before the other)

Pseudo-code

val keys = Seq(/* keys */)
val cmds = Seq(/* cmds */)
redisClient.transaction(cmds)
redisClient.mGet(keys).map(keys.zip) // .zip takes an Iterable, so a BoxedUnit will throw
nepthar commented 7 years ago

This definitely is a bug, but I think it's a bit more nuanced than handling the QUEUED response. Are you willing to do a bit of debugging? It would be really helpful to see what's going over the wire when this race condition hits.

Questions:

chrisbenincasa commented 7 years ago

@nepthar definitely willing to debug. I was not able to reproduce the NoSuchMethodError (I think that was a bug on my end), but I can reproduce the AbstractMethodError consistently. I setup a quick test in the form I described above. Here are some details:

The test:

it should "fail" in {
  val underlyingClient = client.redisClient
  val keyVals = List.fill(10)(Buf.Utf8(randomString) -> Buf.Utf8(randomString))
  val cmds = Seq(
    MSet(keyVals.toMap)
  ) ++ keyVals.map { case (k, _) => PExpire(k, 100)  }
  underlyingClient.transaction(cmds)
  val fut = underlyingClient.mGet(keyVals.map { case (k, _) => k  }).map(res => {
    res.foreach(println)
  })
  com.twitter.util.Await.result(fut, com.twitter.util.Duration.Top)
}

The breakpoint on QUEUED getting hit: screen shot 2017-05-04 at 3 12 11 pm

The redis monitor output:

) redis-cli -p 30144 monitor
OK
1493925332.472761 [0 172.17.0.1:41410] "MULTI"
1493925332.495212 [0 172.17.0.1:41410] "MSET" "g8iEo" "0qHv1" "9WpFi" "ybjwz" "4ydI3" "8uRXO" "oO29P" "rJWKH" "Chget" "CzTXa" "19J6T" "JhJAl" "jln65" "wW5pg" "o3w9L" "kumUR" "OrOfM" "36eE2" "PLz74" "EyYIj"
1493925332.495270 [0 172.17.0.1:41410] "PEXPIRE" "o3w9L" "100"
1493925332.495296 [0 172.17.0.1:41410] "PEXPIRE" "jln65" "100"
1493925332.495318 [0 172.17.0.1:41410] "PEXPIRE" "g8iEo" "100"
1493925332.495340 [0 172.17.0.1:41410] "PEXPIRE" "4ydI3" "100"
1493925332.495362 [0 172.17.0.1:41410] "PEXPIRE" "Chget" "100"
1493925332.495384 [0 172.17.0.1:41410] "PEXPIRE" "PLz74" "100"
1493925332.495405 [0 172.17.0.1:41410] "PEXPIRE" "OrOfM" "100"
1493925332.495428 [0 172.17.0.1:41410] "PEXPIRE" "oO29P" "100"
1493925332.495450 [0 172.17.0.1:41410] "PEXPIRE" "19J6T" "100"
1493925332.495471 [0 172.17.0.1:41410] "PEXPIRE" "9WpFi" "100"
1493925332.495493 [0 172.17.0.1:41410] "MGET" "o3w9L" "jln65" "g8iEo" "4ydI3" "Chget" "PLz74" "OrOfM" "oO29P" "19J6T" "9WpFi"
1493925332.495557 [0 172.17.0.1:41410] "EXEC"

Test output:

[info] - should fail *** FAILED ***
[info]   java.lang.ClassCastException: scala.runtime.BoxedUnit cannot be cast to scala.collection.Seq
[info]   at com.curalate.redis.test.RedisTests$$anonfun$1$$anonfun$17.apply(RedisTests.scala:151)
[info]   at com.twitter.util.Future$$anonfun$map$1$$anonfun$apply$3.apply(Future.scala:1145)
[info]   at com.twitter.util.Try$.apply(Try.scala:15)
[info]   at com.twitter.util.Future$.apply(Future.scala:163)
[info]   at com.twitter.util.Future$$anonfun$map$1.apply(Future.scala:1145)
[info]   at com.twitter.util.Future$$anonfun$map$1.apply(Future.scala:1144)
[info]   at com.twitter.util.Promise$Transformer.liftedTree1$1(Promise.scala:107)
[info]   at com.twitter.util.Promise$Transformer.k(Promise.scala:107)
[info]   at com.twitter.util.Promise$Transformer.apply(Promise.scala:117)
[info]   at com.twitter.util.Promise$Transformer.apply(Promise.scala:98)

The Redis monitoring output tells the tale, I think. You can see that the MGET executes within the MULTI block, thus returning the QUEUED response. My callback on the MGET function in the code then interprets that response as successful, but the underlying type is not a Seq.

Let me know what else I can do to help!

nepthar commented 7 years ago

Nice detective work!

It does indeed look pretty clear cut. I'm not super familiar with redis, but is it also possible to view the server's response to each of those?

A fix for this is now on our roadmap internally, although I'm uncertain of when it will be finished. The immediate workaround is to allow the transaction to complete before issuing another command:

client.transaction(cmds).flatMap { _ => client.mGet(....) }.map { results => ... } 

If you'd like/have time, we're always accepting pull requests! My starting point would be examining RedisPool and seeing what's going on with transactions there.

jcrossley commented 7 years ago

@chrisbenincasa all of our finagle clients are set up to handle one response per one request, and it seems that for these transaction requests, that one-one mapping no longer holds, hence the mismatched requests/reponse pairs, and the ClassCastException. There currently aren't any plans to fix this behavior, but please feel free to submit a PR!

chrisbenincasa commented 7 years ago

@jcrossley I'm not sure I follow -- the single QUEUED response comes from a single MGET request. It's not that the response are mismatched (unless you mean that the eventual result for an MGET issues during a transaction ends up as a part of the response for an EXEC request), but rather that the single QUEUED response is cast opaquely to Unit. In other words, consumers have no way of knowing if the single response is actually the type they expect, if the "raw" response from Redis is a QUEUED type.

chrisbenincasa commented 7 years ago

As a quick follow-up, I'm certainly interested in taking on the change here. Though, I'm curious as to what your opinions are on the correct solution. To some degree, one could argue this is user error for using Redis transactions in a multi-threaded application environment; so you could consider this a sort of error state (maybe a specialized Exception to let the caller handle the QUEUED case?). On the other hand, the response isn't technically an error, so an exception may not fit the bill.