Closed moyeanl closed 1 month ago
@moyeanl On the sender side a round-robin approach is used to selecting the next NetworkPublication
and therefore it will eventually go over the publication that throws an exception.
For example if there are 5 NetworkPublications
and the pub3
throws an exception then the doWork
cycle will looks something like this:
pub1 -> pub2 -> pub3 -> IOException
pub2 -> pub3 -> IOException
pub3 -> IOException
pub4 -> pub5 -> pub1 -> pub2 -> pub3 -> IOException
pub5 -> pub1 -> pub2 -> pub3 -> IOException
Eventually pub3
will timeout and will be removed.
The same approach is used for dealing with the Destinations
in MDC case, i.e. a failing destination will not prevent data to be sent to other destinations.
I have some confusion and need some clarification.
What is the detailed implementation of the UdpChannel, specifically focusing on how it handles IOExceptions during message send and receive operations?
Can you provide the exact structure and handling mechanism of the doWork cycle in both the sender and receiver, and how it interacts with the UdpChannel?
How does the AgentRunner's ErrorHandler process exceptions, and is there a mechanism in place to allow other active publications or subscriptions to continue functioning despite an IOException?
Once I get your answers I will proceed to address the problem. Thank you!
@moyeanl On the sender side a round-robin approach is used to selecting the next
NetworkPublication
and therefore it will eventually go over the publication that throws an exception.For example if there are 5
NetworkPublications
and thepub3
throws an exception then thedoWork
cycle will looks something like this:
pub1 -> pub2 -> pub3 -> IOException
pub2 -> pub3 -> IOException
pub3 -> IOException
pub4 -> pub5 -> pub1 -> pub2 -> pub3 -> IOException
pub5 -> pub1 -> pub2 -> pub3 -> IOException
Eventually
pub3
will timeout and will be removed.The same approach is used for dealing with the
Destinations
in MDC case, i.e. a failing destination will not prevent data to be sent to other destinations.
The round-robin approach is indeed effective in the multi publications case, but it is different in the multi destinations case. Due to the IO exception thrown by the round robin method every time, it cannot return normally, and the senderPosition of NetworkPublication will not increase normally. The publication will continuously retry sending the same sendBuffer segment .
I have some confusion and need some clarification.
- What is the detailed implementation of the UdpChannel, specifically focusing on how it handles IOExceptions during message send and receive operations?
- Can you provide the exact structure and handling mechanism of the doWork cycle in both the sender and receiver, and how it interacts with the UdpChannel?
- How does the AgentRunner's ErrorHandler process exceptions, and is there a mechanism in place to allow other active publications or subscriptions to continue functioning despite an IOException?
Once I get your answers I will proceed to address the problem. Thank you!
The fault occurred in the log channel of the Aeron cluster, and the configuration of the log channel is as follows: aeron:udp?control-mode=manual|rcv-wnd=64m|so-rcvbuf=128m|term-length=128m|alias=log|fc=max
In the ErrorHandler, only logs will be printed and no interrupts will be generated.
I noticed that when sending a message to a UdpChannel or receiving a message from a Udpchannel, an IOException may be thrown. Once an exception is thrown, the dowork of the sender or receiver will be interrupted until it is caught by the top-level AgentRunner's ErrorHandler. Even if there are other active publications or Substations in MediaDriver at this time, they will not work properly as a result. Is this in line with the design purpose?