real-logic / aeron

Efficient reliable UDP unicast, UDP multicast, and IPC message transport
https://aeron.io
Apache License 2.0
7.43k stars 892 forks source link

How reliability is ensured #657

Closed kapralius closed 5 years ago

kapralius commented 5 years ago

I read this wiki and other aeron documents for a day and it was only from issue request "Finally, is Aeron reliable or not? #180" where I found ultimate clear answer to this fundament question.

However, having this info I have another question - how is the reliabilityachieved here - is it ACK-based or NACK-based?

If the ACK is the case, the solution is not scalable to >100 receivers (or some sophisticated measures are taken?)

If it is NACK-based (similar to PGM), then how are NACK storms prevented, how are missed data provided?

Thanks for answers

juddgaddie commented 5 years ago

NAK based - flow control strategies are used to not overwhelm subscribers. From my experience NAK storms occur when slow subscribers are unable to keep up with the publisher.

see: https://github.com/real-logic/aeron/wiki/Flow-and-Congestion-Control

tmontgomery commented 5 years ago

To fully see what is done, you will want to look at how NAK processing is done. And will want to familiarize yourself with previous work. Some links are in the javadoc for:

https://github.com/real-logic/aeron/blob/master/aeron-driver/src/main/java/io/aeron/driver/OptimalMulticastDelayGenerator.java

And the spec has much more

https://github.com/real-logic/aeron/wiki/Protocol-Specification

BTW, you can make ACK-based schemes scale. I've scaled them to hundreds of nodes quite easily. You can see some of the basics;

https://www.researchgate.net/publication/2399998_RMTP_A_Reliable_Multicast_Transport_Protocol