zeromq / netmq

A 100% native C# implementation of ZeroMQ for .NET
Other
2.95k stars 744 forks source link

[Advice?] NetMQRuntime + socket Async methods never complete #904

Closed TimWilde closed 4 years ago

TimWilde commented 4 years ago

Probably not a bug report, so much as a request for some advice or pointers

I'm attempting to build a service auto-discovery mechanism using NetMQBeacons and sockets to find peers and exchange capability details. I have a version working with the Try... methods with timeouts, but this introduces a lot of latency, so I decided to rewrite using the new Async additions to NetMQ using the NetMQRuntime and NetMQQueue.

When running, the beacons are received (so service to service comms is working) and those details are passed via the NetMQQueue successfully, but when attempting to use the Async methods on sockets to send and receive messages via TCP nothing arrives at the destination.

// This never returns and the code is effectively blocked indefinitely
NetMQMessage message = await presenceSocket.ReceiveMultipartMessageAsync( 3, token );

I'm attempting to use the RouterSocket and DealerSocket and I explicitly set the identity for each. I am also packaging messages into frames [ Identity, Empty, Data, Data, Data... ]

I've put together a simple repro codebase, linked further down.

I wouldn't be surprised to find that I have misunderstood or misconfigured the new Async features - could anyone have a look at my code and check for anything obvious, please?

Environment

NetMQ Version:    4.0.1.2-pre (nuget package, for Linux UDP broadcast fixes)
Operating System: Windows 10/Linux (Docker containers)
.NET Version:     .NET Core 3.1

Expected behaviour

New ...Async methods run in a NetMQRuntime context should not block indefinitely. Messages should be delivered between services.

Actual behaviour

Calls to ...Async methods block indefinitely - I've left this running for over half an hour with no response.

Steps to reproduce the behaviour

Full code spike repo which demonstrates the problem here: https://github.com/TimWilde/netmq_discovery

TimWilde commented 4 years ago

I've found a solution. I had misunderstood a few things.

My code needed to be intrinsically multithreaded as I am doing several things which mutually block when on the same thread.

This then meant I ran into the problem that NetMQRuntime.Current is ThreadLocal so I then needed to have a separate instance in each thread that uses the async methods.

I then ran into another problem which was that I was prepending the service identity to outbound requests, which the RouterSocket also does upon receipt, so my validation code (expecting [Id, Empty, Data]) was rejecting the requests.

That plus a few other little changes are now done and the code is running and very fast. I exhaust CPU and IO before it starts failing, which I can accept - this will be scattered across a Kubernetes cluster with much more resources than my laptop, eventually. 😄

The changes are in the codebase linked above if anyone is interested - it's about 170 lines of code in the Alpha/Services/AlphaService.cs file, the rest is just ancillary.

somdoron commented 4 years ago

Nice :)

On Sat, Jun 13, 2020, 20:16 Tim Wilde notifications@github.com wrote:

Closed #904 https://github.com/zeromq/netmq/issues/904.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/zeromq/netmq/issues/904#event-3440972961, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAUW75S2XKO4GO6ZOTKF2HTRWOX7PANCNFSM4N4JB32A .