serhatarslan-hub / HomaL4Protocol-ns-3

NS3 implementation of Homa Transport Protocol
GNU General Public License v2.0
20 stars 9 forks source link

Enabled retransmission logic causes simulation to not finish #9

Open joft-mle opened 2 weeks ago

joft-mle commented 2 weeks ago

Hi @serhatarslan-hub,

just for the sake of completeness and as a follow-up to the solution by @marvin71 für issue #7, on the side, we also ran the default tests case (effective duration of 0.5, assumption of 0.1 seconds of saturation, 4 independent runs in parallel, e75c4de489eb) without the --disableRtx option.

To our surprise -- and that's the reason for opening this issue -- in contrast to previous, older runs without --disableRtx, this time both parts (load 0.8 and load 0.5) simply did not finish after the usual amount of run time. Instead the 4 ns-3 instances just sat there at 100% CPU each, doing no output to stdout and the trace files anymore.

For example, the end of the output for run w/ load 0.5 looks like the following lines (2 excerpts, output of all 4 instance might be mixed!) . The run w/ load 0.8 shows similar effects.

3551821404 HomaSendScheduler (0x55b8abb40a60) received a RESEND packet for an unknown txMsgId (110).
3552820574 Rtx Limit has been reached for the inbound Msg (0x55b8b7fd2940).
- 3555140600 27253820 10.0.13.1:1013 10.0.103.1:1103 108
3555142883 HomaSendScheduler (0x55b8abafd560) received a GRANT packet for an unknown txMsgId (108).
3556142883 HomaSendScheduler (0x55b8abafd560) received a RESEND packet for an unknown txMsgId (108).
3556142916 HomaSendScheduler (0x55b8abafd560) received a RESEND packet for an unknown txMsgId (108).
3556142949 HomaSendScheduler (0x55b8abafd560) received a RESEND packet for an unknown txMsgId (108).
3556142982 HomaSendScheduler (0x55b8abafd560) received a RESEND packet for an unknown txMsgId (108).
3556143015 HomaSendScheduler (0x55b8abafd560) received a RESEND packet for an unknown txMsgId (108).
3556143048 HomaSendScheduler (0x55b8abafd560) received a RESEND packet for an unknown txMsgId (108).
3557142916 HomaSendScheduler (0x55b8abafd560) received a RESEND packet for an unknown txMsgId (108).
3557142949 HomaSendScheduler (0x55b8abafd560) received a RESEND packet for an unknown txMsgId (108).
3557143015 HomaSendScheduler (0x55b8abafd560) received a RESEND packet for an unknown txMsgId (108).
3557143048 HomaSendScheduler (0x55b8abafd560) received a RESEND packet for an unknown txMsgId (108).
3557143081 HomaSendScheduler (0x55b8abafd560) received a RESEND packet for an unknown txMsgId (108).
3557143181 HomaSendScheduler (0x55b8abafd560) received a RESEND packet for an unknown txMsgId (108).
3557143214 HomaSendScheduler (0x55b8abafd560) received a RESEND packet for an unknown txMsgId (108).
- 3557763083 28227640 10.0.132.1:1132 10.0.140.1:1140 103
- 3557935390 26281460 10.0.123.1:1123 10.0.20.1:1020 110
3557937673 HomaSendScheduler (0x55b8abc3d110) received a GRANT packet for an unknown txMsgId (92).

[...]

3560945011 HomaSendScheduler (0x55b8abc8d000) received a RESEND packet for an unknown txMsgId (101).
3560945044 HomaSendScheduler (0x55b8abc8d000) received a RESEND packet for an unknown txMsgId (101).
3560945077 HomaSendScheduler (0x55b8abc8d000) received a RESEND packet for an unknown txMsgId (101).
3560945110 HomaSendScheduler (0x55b8abc8d000) received a RESEND packet for an unknown txMsgId (101).
3560945143 HomaSendScheduler (0x55b8abc8d000) received a RESEND packet for an unknown txMsgId (101).
3560945176 HomaSendScheduler (0x55b8abc8d000) received a RESEND packet for an unknown txMsgId (101).
3561141801 Rtx Limit has been reached for the inbound Msg (0x55b8ba2a97c0).
- 3561190522 27741460 10.0.110.1:1110 10.0.57.1:1057 107
3561192805 HomaSendScheduler (0x55b8abc956a0) received a GRANT packet for an unknown txMsgId (107).
- 3561230997 29200000 10.0.88.1:1088 10.0.119.1:1119 100
3561233280 HomaSendScheduler (0x55b8abc38dc0) received a GRANT packet for an unknown txMsgId (100).
3561944879 HomaSendScheduler (0x55b8abc8d000) received a RESEND packet for an unknown txMsgId (101).
3561944912 HomaSendScheduler (0x55b8abc8d000) received a RESEND packet for an unknown txMsgId (101).
3561944945 HomaSendScheduler (0x55b8abc8d000) received a RESEND packet for an unknown txMsgId (101).
3561944978 HomaSendScheduler (0x55b8abc8d000) received a RESEND packet for an unknown txMsgId (101).
3561945011 HomaSendScheduler (0x55b8abc8d000) received a RESEND packet for an unknown txMsgId (101).
3561945044 HomaSendScheduler (0x55b8abc8d000) received a RESEND packet for an unknown txMsgId (101).

[...]

3566192937 HomaSendScheduler (0x55b8abc956a0) received a RESEND packet for an unknown txMsgId (107).
3566192970 HomaSendScheduler (0x55b8abc956a0) received a RESEND packet for an unknown txMsgId (107).
3566193003 HomaSendScheduler (0x55b8abc956a0) received a RESEND packet for an unknown txMsgId (107).
3566233280 HomaSendScheduler (0x55b8abc38dc0) received a RESEND packet for an unknown txMsgId (100).
3566233313 HomaSendScheduler (0x55b8abc38dc0) received a RESEND packet for an unknown txMsgId (100).
3566233346 HomaSendScheduler (0x55b8abc38dc0) received a RESEND packet for an unknown txMsgId (100).
3566233379 HomaSendScheduler (0x55b8abc38dc0) received a RESEND packet for an unknown txMsgId (100).
3566233412 HomaSendScheduler (0x55b8abc38dc0) received a RESEND packet for an unknown txMsgId (100).
3566233445 HomaSendScheduler (0x55b8abc38dc0) received a RESEND packet for an unknown txMsgId (100).
3566233478 HomaSendScheduler (0x55b8abc38dc0) received a RESEND packet for an unknown txMsgId (100).
3567191723 Rtx Limit has been reached for the inbound Msg (0x55b8b95d6f80).
3567232198 Rtx Limit has been reached for the inbound Msg (0x55b8ba121b10).
- 3569183771 23847640 10.0.143.1:1143 10.0.3.1:1003 78
- 3578365209 28713820 10.0.132.1:1132 10.0.30.1:1030 114
- 3599766756 26281460 10.0.132.1:1132 10.0.93.1:1093 104

So, it looks like the retransmission logic causes some kind of asynchronism between sender(s) and receivers. For the above, MsgTraces-SlowdownAnalysis.ipynb, reports 6 incomplete messages. So not too much is "missing" to complete the simulation. According to time stamps, simulation time did advance up to ~3.6s, which is more or less what is expected for the test parameters, of course.

serhatarslan-hub commented 1 week ago

Thanks for the catch! The original Homa paper and the OMNET++ simulator doesn't dive deep into the retransmission logic of the protocol. Hence the implementation of the retransmission logic in this ns3 version is not thoroughly tested. You can refer to the linux kernel module implementation of the protocol for a complete view of how the protocol behaves in such situations.