planetarium / libplanet

Blockchain in C#/.NET for on-chain, decentralized gaming
GNU Lesser General Public License v2.1
505 stars 139 forks source link

System.Net.Sockets.SocketException (22) on macOS #2740

Open longfin opened 1 year ago

longfin commented 1 year ago
          is there any cause to `macos-netcore-test` failed with this error?
Passed!  - Failed:     0, Passed:   167, Skipped:     0, Total:   167, Duration: 19 s - /Users/distiller/project/Libplanet.Tests/bin/Release/net6.0/Libplanet.Tests.dll (net6.0)
The active test run was aborted. Reason: Test host process crashed : Unhandled exception. System.Net.Sockets.SocketException (22): Invalid argument
   at System.Net.Sockets.Socket.UpdateStatusAfterSocketErrorAndThrowException(SocketError error, String callerName)
   at NetMQ.Core.Transports.Tcp.TcpListener.InCompleted(SocketError socketError, Int32 bytesTransferred)
   at NetMQ.Core.Utils.Proactor.Loop()
   at System.Threading.Thread.StartCallback()

Results File: /tmp/junit/Libplanet.Net.Tests.xml

Passed!  - Failed:     0, Passed:    26, Skipped:     0, Total:    26, Duration: 26 s - /Users/distiller/project/Libplanet.Net.Tests/bin/Release/net6.0/Libplanet.Net.Tests.dll (net6.0)
Test Run Aborted with error System.Exception: One or more errors occurred.
 ---> System.Exception: Unable to read beyond the end of the stream.
   at System.IO.BinaryReader.Read7BitEncodedInt()
   at System.IO.BinaryReader.ReadString()
   at Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.LengthPrefixCommunicationChannel.NotifyDataAvailable()
   at Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.TcpClientExtensions.MessageLoopAsync(TcpClient client, ICommunicationChannel channel, Action`1 errorHandler, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---.

Exited with code exit status 1

The test is terminated with those errors in macos-netcore-test with this PR

Originally posted by @riemannulus in

longfin commented 1 year ago

SocketException (22) can be thrown from .NET runtime / macOS by the below reasons

At first, I assumed this to be a Linger-related issue, but in that case, the error will occurs on .Accept()...

Of course, it is possible that the function has been inlined... but the current estimate may not be accurate.

longfin commented 1 year ago

Setting noDelay to the accepted socket may be a problem. (NetMQ had done a similar fixe 4 years ago).

I don't have confidence that it's a NetMQ side bug yet. but it seems helpful to debug current situation.

longfin commented 1 year ago Maybe related? 🤔

echo "xnet_skip_checks/W1" | mdb -kw

if this workaround tricks, it seems a timing issue about _router in NetMQTransport.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions.