twitter / finagle

A fault tolerant, protocol-agnostic RPC system
https://twitter.github.io/finagle
Apache License 2.0
8.78k stars 1.45k forks source link

LostSyncException using finagle-mysql upon reading paged in data #926

Closed dispalt closed 2 years ago

dispalt commented 2 years ago

Describe the bug I can't quite tell what is causing this, but I am getting the following stack trace:

MYSQLMon is a custom monitor.

2022-03-14 02:25:01 ERROR [finagle/netty4-6-5] [undefined] MYSQLMon - mysql query failed
com.twitter.finagle.mysql.LostSyncException: com.twitter.io.ByteReader$UnderflowException: tried to read 8 byte(s) when remainin
g bytes was 4
        at com.twitter.io.ByteReaderImpl.checkRemaining(ByteReader.scala:308)
        at com.twitter.io.ByteReaderImpl.readLongLE(ByteReader.scala:449)
        at com.twitter.io.ProxyByteReader.readLongLE(ByteReader.scala:243)
        at com.twitter.io.ProxyByteReader.readLongLE$(ByteReader.scala:243)
        at com.twitter.finagle.mysql.transport.MysqlBufReader.readLongLE(MysqlBuf.scala:42)
        at com.twitter.finagle.mysql.transport.MysqlBufReader.readVariableLong(MysqlBuf.scala:127)
        at com.twitter.finagle.mysql.ClientDispatcher.$anonfun$decodePacket$15(ClientDispatcher.scala:222)
        at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.scala:17)
        at com.twitter.util.Try$.apply(Try.scala:26)
        at com.twitter.finagle.mysql.ClientDispatcher.decodePacket(ClientDispatcher.scala:220)
Caused by: com.twitter.io.ByteReader$UnderflowException: tried to read 8 byte(s) when remaining bytes was 4
        at com.twitter.io.ByteReaderImpl.checkRemaining(ByteReader.scala:308)
        at com.twitter.io.ByteReaderImpl.readLongLE(ByteReader.scala:449)
        at com.twitter.io.ProxyByteReader.readLongLE(ByteReader.scala:243)
        at com.twitter.io.ProxyByteReader.readLongLE$(ByteReader.scala:243)
        at com.twitter.finagle.mysql.transport.MysqlBufReader.readLongLE(MysqlBuf.scala:42)
        at com.twitter.finagle.mysql.transport.MysqlBufReader.readVariableLong(MysqlBuf.scala:127)
        at com.twitter.finagle.mysql.ClientDispatcher.$anonfun$decodePacket$15(ClientDispatcher.scala:222)
        at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.scala:17)
        at com.twitter.util.Try$.apply(Try.scala:26)
        at com.twitter.finagle.mysql.ClientDispatcher.decodePacket(ClientDispatcher.scala:220)

To Reproduce I am not sure, yet but this is connecting to amazon aurora 2. I've been using it in production with minor upgrades for at least 3 years now, and I haven't seen this before. I also checked the logs and there's nothing about throttling or max capacity like mentioned in this issue, https://github.com/twitter/finagle/issues/573.

Environment

Additional context

yufangong commented 2 years ago

Hi @dispalt, thank you for reaching out. Which MySQL version you are running on? and do you have a list of the recent upgrades from your side or aurora 2? It smells like a corrupted payload. Can you also provide the type of the query and more info about the payload if possible. Thank you!

dispalt commented 2 years ago

Sorry I've been trying to get more information but it happens like once every other week on something that's processing 10-20 qps, so it's obviously rare and maybe just some sort of network hiccup? I mean this is on AWS and this seems possible...

mosesn commented 2 years ago

@dispalt got it. Can you at least figure out which MySQL version it is? It would be good to confirm that it's a version of MySQL that we support / test (5.7 and 8 right now).

dispalt commented 2 years ago

Here you go. Still hasn't happened again.

aurora_version,2.10.2
innodb_version,5.7.12
protocol_version,10
version,5.7.12
version_comment,MySQL Community Server (GPL)
version_compile_machine,x86_64
version_compile_os,Linux
dispalt commented 2 years ago

I haven't run into this again. I have no idea but I guess it was probably a network anomaly. Sorry for the noise.