real-logic / aeron

Efficient reliable UDP unicast, UDP multicast, and IPC message transport
Apache License 2.0
7.39k stars 888 forks source link

How to debug multicast issues #670

Closed puiuvlad closed 5 years ago

puiuvlad commented 5 years ago

Hi,

I have a couple of applications that run on two machines, a linux one and a windows one. All of them publish on and subscribe to the same multicast address. The network has three switches but the two machines talking to each other are connected to the same switch. The network also has two wifi hot spots connected to two of the switches. The messages are basically sent from one process on the linux machine to another process on the same machine. The windows machine just subscribes to all messages published by the linux machine. One media driver runs on each of the two machines. The message sizes are under 1K, and the message frequency is not too high (one process publishes a message on the multicast channel and another process replies on the same multicast channel, say 500 millis apart).

The issue that I am seeing is that after a while one process sends a message and the subscribing process is not receiving it. The network is left in a 'suspicious' state, where wifi communication becomes spotty, etc. However, if I start a process from the windows machine, its messages reach the linux machine, pointing to the wired network being in good shape.

I have no idea about what could be wrong. Do you have any suggestions on how to fix this issue, or what could be wrong? Would monitoring the media driver offer any clues?

Thanks, Vladimir

mjpt777 commented 5 years ago

Aeron come with debugging support which you could begin with.

https://github.com/real-logic/aeron/wiki/Monitoring-and-Debugging