Open jkern888 opened 2 years ago
Depending on how you use the library this might happen when using e.g. a single connection with transactions from multiple threads without proper synchronization.
@mp911de Thanks for the comment. I don't think that's the case here as our usage is pretty uncomplicated, just using the synchronous api to do get, set, expire, del from a single connection. We do also use the compression codec to zip the data, along with jackson for de/serializtion. But all of those claim to be threadsafe.
The fact this could happen from incorrect usage like you said is interesting though. Is there a particular part of the library you could point me to? May help in debugging since possibly there's some other path into the same state?
Hum, when using Jackson as Codec, then encoding errors may bring the protocol out of sync. Please check whether you can see some exceptions prior to getting out of sync.
@mp911de Interesting. Here's the code for the codec we're using. Should we be swallowing the exception and returning null?
public class JacksonRedisCodec<T> implements RedisCodec<String, T> {
private final ObjectWriter writer;
private final ObjectReader reader;
// Encoding used for the key string.
private static final Charset UTF_8 = Charsets.UTF_8;
public JacksonRedisCodec(ObjectWriter writer, ObjectReader reader) {
this.writer = writer;
this.reader = reader;
}
@Override
public String decodeKey(ByteBuffer byteBuffer) {
ByteBufferBackedInputStream inputStream = new ByteBufferBackedInputStream(byteBuffer);
StringWriter writer = new StringWriter();
try {
IOUtils.copy(inputStream, writer, UTF_8.name());
}
catch (IOException e) {
throw new RuntimeException(e);
}
return writer.toString();
}
@Override
public T decodeValue(ByteBuffer byteBuffer) {
ByteBufferBackedInputStream inputStream = new ByteBufferBackedInputStream(byteBuffer);
try {
return reader.readValue(inputStream);
}
catch (IOException e) {
throw new RuntimeException(e);
}
}
@Override
public ByteBuffer encodeKey(String string) {
return ByteBuffer.wrap(string.getBytes(UTF_8));
}
@Override
public ByteBuffer encodeValue(T t) {
try {
return ByteBuffer.wrap(writer.writeValueAsBytes(t));
}
catch (JsonProcessingException e) {
throw new RuntimeException(e);
}
}
}
Any exceptions that happen on the codec level affect directly what is being sent to the Redis server. If you have say a command sequence of SET
-> SET
-> GET
and the second SET
throws an exception (SET
-> 💥 -> GET
), then the client is no longer synchronized to the protocol as the second command was never sent to the Redis server.
We try to recover as much as possible upon encoding/decoding but there's always a fair chance that exceptions during encoding/decoding may impact the protocol synchronization.
I strongly suggest moving the encoding/decoding part into a higher level component so that you can handle exceptions properly.
The same issue has just re-occurred. Digging into our logging I'm seeing no exceptions around the codecs. The only thing I do see is a command timeout:
io.lettuce.core.RedisCommandTimeoutException: Command timed out after 1 second(s)
at io.lettuce.core.internal.ExceptionFactory.createTimeoutException(ExceptionFactory.java:53)
at io.lettuce.core.protocol.CommandExpiryWriter.lambda$potentiallyExpire$0(CommandExpiryWriter.java:172)
at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
at io.netty.util.concurrent.DefaultEventExecutor.run(DefaultEventExecutor.java:66)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:832)
Immediately after that we start getting many errors due to responses being de-synced. We see that same timeout exception at other times with no ill effect, but maybe it's occasionally triggering this bad state?
Possible result of being out of sync https://github.com/lettuce-io/lettuce-core/issues/2175
@jkern888 I have a question. I meet same problem but not related with codec. you said problem was occured when upgrade to 6.1.1. So, I Wondering which version is your last succeed version?
previously, I use lettuce v5.x and this version does not have this problem.
We have got the same problem. And I reproduced the problem by using below codes:
@SpringBootTest
@TestInstance(TestInstance.Lifecycle.PER_CLASS)
public class RedisTest {
@Autowired
private ReactiveRedisTemplate redisTemplate;
private ExecutorService executorService = Executors.newFixedThreadPool(20);
@Test
public void testRedis() throws Exception{
Random random = new Random();
AtomicInteger counter = new AtomicInteger(0);
for (int i = 0; i < 100000; i++) {
int finalI = i;
executorService.submit(() -> {
redisTemplate.opsForValue().set("testkey", "testvalue" + finalI)
//.publishOn(Schedulers.boundedElastic())
.flatMap(r1 -> {
System.out.println(Thread.currentThread().getName() + " : result1 = " + r1);
return redisTemplate.opsForHash().get("test_hash", "hash_field") //, "test_value" + finalI)
// .publishOn(Schedulers.boundedElastic())
.defaultIfEmpty("")
.flatMap(r2 -> {
if(random.nextInt() % 2 == 0 && counter.get() < 100){
counter.incrementAndGet();
throw new OutOfMemoryError("test out of memory error");
}
System.out.println(Thread.currentThread().getName() + " : result2 = " + r2);
return redisTemplate.opsForHash().put("test_hash", "hash_field", "hash_value" + finalI)
//.publishOn(Schedulers.boundedElastic())
.flatMap(r3 -> {
System.out.println(Thread.currentThread().getName() + " : result3 = " + r3);
return redisTemplate.opsForValue()
.get("testkey")
//.publishOn(Schedulers.boundedElastic())
.map(r4 -> {
System.out.println(Thread.currentThread().getName() + " : result4 = " + r4);
return r4;
});
});
});
})
// .subscribeOn(Schedulers.boundedElastic())
.subscribe();
});
TimeUnit.MILLISECONDS.sleep(10);
}
System.in.read();
}
}
My environment is:
I found if my business codes throw errors, it will be catched by the lettuce pool thread.While this will finally lead to java.lang.UnsupportedOperationException: io.lettuce.core.output.ValueOutput does not support set(long)
exception.
And If I indicate a thread pool to my business codes by publishOn, the error won't be caught by lettuce thread.This can void the exception happening.
@qinchunabng Thanks for sharing. I have one question. You example code has two comments. one is 'publishOn' and other is 'subscribeOn' To avoid exception happening, needs uncomment both?
@qinchunabng Thanks for sharing. I have one question. You example code has two comments. one is 'publishOn' and other is 'subscribeOn' To avoid exception happening, needs uncomment both?
Just uncomment 'publishOn'. 'subscribeOn' is testing code.
@qinchunabng Thanks for sharing. I have one question. You example code has two comments. one is 'publishOn' and other is 'subscribeOn' To avoid exception happening, needs uncomment both?
Just uncomment 'publishOn'. 'subscribeOn' is testing code.
hi bro, how to resolve
@qinchunabng Thanks for sharing. I have one question. You example code has two comments. one is 'publishOn' and other is 'subscribeOn' To avoid exception happening, needs uncomment both?
Just uncomment 'publishOn'. 'subscribeOn' is testing code.
hi bro, how to resolve Using publishOn indicates the business code in flatMap executing in other thread pool not in lettuce's thread pool. Through my test if your code run in lettuce's thread pool and it has some exceptions of Error occuring,there will probably result out of sync.
Our team also encountered this sporadic issue, which occurs approximately once every 1-2 weeks. Through our testing, we found that this problem only occurs on aarch64 servers and is not observed on X86-64 servers. We were able to resolve this issue by adjusting the version of the JDK.
Here is the environmental information: Redis Server: 6.2.5 spring-data-redis: 2.1.20.RELEASE springboot: 2.1.17.RELEASE Lettuce: 6.0.7 Release
Test Result: Oracle JDK 8u131 ---> Exception Oracle JDK 8u201 ---> Exception Oracle JDK 8u251 ---> OK
Open JDK 8u352 ---> OK Open JDK 8u292 ---> OK
@wqq0712 Thanks for good information.
I have meet this problem with x86-64 server and Open JDK 8u222
So, I think aarch64 and x86-64 is not important condigion. I will try test with various Open JDK version.
Thank you.
I do not think that the JDK version is involved. Rather, out of sync happens either if a codec fails encoding/decoding or an output fails (or any other exotic condition happens).
Happy to provide support if you can provide a reliable reproducer.
@wqq0712 Thanks for good information.
I have meet this problem with x86-64 server and Open JDK 8u222
So, I think aarch64 and x86-64 is not important condigion. I will try test with various Open JDK version.
Thank you.
@wqq0712 Thanks for good information.
I have meet this problem with x86-64 server and Open JDK 8u222
So, I think aarch64 and x86-64 is not important condigion. I will try test with various Open JDK version.
Thank you.
@rndmania
I think maybe there is a bug in the 8u222 version, which I also use on the aarch64 platform. The problem stopped when I upgraded to 8u341.
Current Behavior
We've recently updated lettuce to 6.1.1.RELEASE and after running it in our production environment for a week we ran into a troubling issue. At some point it looks like lettuce worked itself into a state where commands were receiving responses to different requests. This manifested as lots of deserialization errors (e.g.
java.lang.UnsupportedOperationException: io.lettuce.core.output.ValueOutput does not support set(long)
), but also more seriously as incorrect data being returned to the caller.Restarting the service brought things back to a working state. Prior to this happening the only other thing we observed were some
RedisCommandTimeoutException
being thrown, but not at an anomalous level.For the meantime we've added some more aggressive checking for this happening, so if it does reoccur I'd be happy to try and gather more information, but not sure what would be helpful?
Environment