planetarium / libplanet

Blockchain in C#/.NET for on-chain, decentralized gaming
https://docs.libplanet.io/
GNU Lesser General Public License v2.1
506 stars 142 forks source link

Swarm<T> crashes #453

Closed longfin closed 4 years ago

longfin commented 5 years ago
08/22/2019 07:35:52 +00:00: Could not parse NetMQMessage properly; ignore: Libplanet.Net.InvalidMessageException: the message signature is invalid
  at Libplanet.Net.Messages.Message.Parse (NetMQ.NetMQMessage raw, System.Boolean reply) [0x00074] in <c12a8bf88662424e9873f09c24ce28fe>:0 
  at Libplanet.Net.Swarm`1[T].ReceiveMessage (System.Object sender, NetMQ.NetMQSocketEventArgs e) [0x00030] in <c12a8bf88662424e9873f09c24ce28fe>:0 
Libplanet.Net.InvalidMessageException: the message signature is invalid
  at Libplanet.Net.Messages.Message.Parse (NetMQ.NetMQMessage raw, System.Boolean reply) [0x00074] in <c12a8bf88662424e9873f09c24ce28fe>:0 
  at Libplanet.Net.Swarm`1[T].ReceiveMessage (System.Object sender, NetMQ.NetMQSocketEventArgs e) [0x00030] in <c12a8bf88662424e9873f09c24ce28fe>:0 
08/22/2019 07:35:52 +00:00: A raw message [frame count: 4] has received.
08/22/2019 07:35:52 +00:00: An unexpected exception occured during ReceiveMessage(): System.OverflowException: Arithmetic operation resulted in an overflow.
  at Libplanet.Net.Messages.Message.Parse (NetMQ.NetMQMessage raw, System.Boolean reply) [0x00029] in <c12a8bf88662424e9873f09c24ce28fe>:0 
  at Libplanet.Net.Swarm`1[T].ReceiveMessage (System.Object sender, NetMQ.NetMQSocketEventArgs e) [0x00030] in <c12a8bf88662424e9873f09c24ce28fe>:0 
System.OverflowException: Arithmetic operation resulted in an overflow.
  at Libplanet.Net.Messages.Message.Parse (NetMQ.NetMQMessage raw, System.Boolean reply) [0x00029] in <c12a8bf88662424e9873f09c24ce28fe>:0 
  at Libplanet.Net.Swarm`1[T].ReceiveMessage (System.Object sender, NetMQ.NetMQSocketEventArgs e) [0x00030] in <c12a8bf88662424e9873f09c24ce28fe>:0 

after then

FaultException: Cannot close an uninitialised Msg.
  at NetMQ.Msg.Close () [0x00012] in <21696b85a92a4a0eb8332ff57aebfd69>:0 
  at NetMQ.Core.Transports.V2Encoder.MessageReady () [0x00000] in <21696b85a92a4a0eb8332ff57aebfd69>:0 
  at NetMQ.Core.Transports.V2Encoder.Next () [0x00017] in <21696b85a92a4a0eb8332ff57aebfd69>:0 
  at NetMQ.Core.Transports.EncoderBase.GetData (NetMQ.Core.Transports.ByteArraySegment& data, System.Int32& size, System.Int32& offset) [0x00040] in <21696b85a92a4a0eb8332ff57aebfd69>:0 
  at NetMQ.Core.Transports.EncoderBase.GetData (NetMQ.Core.Transports.ByteArraySegment& data, System.Int32& size) [0x00002] in <21696b85a92a4a0eb8332ff57aebfd69>:0 
  at NetMQ.Core.Transports.StreamEngine.BeginSending () [0x0000f] in <21696b85a92a4a0eb8332ff57aebfd69>:0 
  at NetMQ.Core.Transports.StreamEngine.Handle (NetMQ.Core.Transports.StreamEngine+Action action, System.Net.Sockets.SocketError socketError, System.Int32 bytesTransferred) [0x00132] in <21696b85a92a4a0eb8332ff57aebfd69>:0 
  at NetMQ.Core.Transports.StreamEngine.FeedAction (NetMQ.Core.Transports.StreamEngine+Action action, System.Net.Sockets.SocketError socketError, System.Int32 bytesTransferred) [0x00000] in <21696b85a92a4a0eb8332ff57aebfd69>:0 
  at NetMQ.Core.Transports.StreamEngine.ActivateOut () [0x00000] in <21696b85a92a4a0eb8332ff57aebfd69>:0 
  at NetMQ.Core.SessionBase.ReadActivated (NetMQ.Core.Pipe pipe) [0x00012] in <21696b85a92a4a0eb8332ff57aebfd69>:0 
  at NetMQ.Core.Pipe.ProcessActivateRead () [0x00021] in <21696b85a92a4a0eb8332ff57aebfd69>:0 
  at NetMQ.Core.ZObject.ProcessCommand (NetMQ.Core.Command cmd) [0x00059] in <21696b85a92a4a0eb8332ff57aebfd69>:0 
  at NetMQ.Core.IOThread.Ready () [0x00016] in <21696b85a92a4a0eb8332ff57aebfd69>:0 
  at NetMQ.Core.IOThreadMailbox.RaiseEvent () [0x00008] in <21696b85a92a4a0eb8332ff57aebfd69>:0 
  at NetMQ.Core.Utils.Proactor.Loop () [0x00050] in <21696b85a92a4a0eb8332ff57aebfd69>:0 
  at System.Threading.ThreadHelper.ThreadStart_Context (System.Object state) [0x00014] in <1f0c1ef1ad524c38bbc5536809c46b48>:0 
  at System.Threading.ExecutionContext.RunInternal (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state, System.Boolean preserveSyncCtx) [0x00071] in <1f0c1ef1ad524c38bbc5536809c46b48>:0 
  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state, System.Boolean preserveSyncCtx) [0x00000] in <1f0c1ef1ad524c38bbc5536809c46b48>:0 
  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state) [0x0002b] in <1f0c1ef1ad524c38bbc5536809c46b48>:0 
  at System.Threading.ThreadHelper.ThreadStart () [0x00008] in <1f0c1ef1ad524c38bbc5536809c46b48>:0 
UnityEngine.DebugLogHandler:Internal_LogException(Exception, Object)
UnityEngine.DebugLogHandler:LogException(Exception, Object)
UnityEngine.Logger:LogException(Exception, Object)
UnityEngine.Debug:LogException(Exception)
UnityEngine.UnhandledExceptionHandler:<RegisterUECatcher>m__0(Object, UnhandledExceptionEventArgs) (at /home/builduser/buildslave/unity/build/Runtime/Export/Scripting/UnhandledExceptionHandler.bindings.cs:46)
longfin commented 5 years ago

It seems that NetMQ I/O thread crashes in some situation.

longfin commented 5 years ago

According to https://github.com/zeromq/netmq/issues/572, it can be thrown if a socket had shared to multiple threads at the same time. but I think we solved this problem using NetMQQueues in Swarm<T>.

Anyway, I'll check it again.

dahlia commented 5 years ago

@longfin Does it happen only in preloading phase? Or in any phases?

longfin commented 5 years ago

I think that it can happen in any phases because I'd captured this log from miner node.

longfin commented 4 years ago

I've added linger to dealers to avoid it. but it's still occurring in the test environment. 😕

longfin commented 4 years ago

I've added linger to dealers to avoid it. but it's still occurring in the test environment. 😕

I mistake. it's another issue (#404).