serhatarslan-hub / HomaL4Protocol-ns-3

NS3 implementation of Homa Transport Protocol
GNU General Public License v2.0
20 stars 9 forks source link

Included slowdown graphs vs. reproduction #7

Closed joft-mle closed 1 month ago

joft-mle commented 4 months ago

Hi @serhatarslan-hub,

I would like to ask which version of the code in this repository has been used to produce the results shown in MsgComletionSlowdown_W5_load-80p.png and the like?

Following the instructions to build and run scratch/HomaL4Protocol-paper-reproduction.cc results in substantially different graphs for us - especially regarding the "slowdown". The graphs in this repo show a slowdown, over message size, which more or less closely "follows" the included OMNet++ course of things. When trying to reproduce this (80% load, P99), we more or less get a constant slowdown of about 7-8 for all message sizes until shortly past ~ 4.866M. For larger message sizes, it then "follows" the OMNet++ data:

MsgComletionSlowdown_W5_load-80p

However, the characteristic "jitter" around message size 150k does exist, for example.

Any idea what we are missing?

serhatarslan-hub commented 4 months ago

Hi,

Thank you for your interest in using this repository. I am happy to help you as much as possible.

The original figure (MsgComletionSlowdown_W5_load-80p.png) was created using a former version of the Homa implementation. This former version is tagged as homa-v1.0.

There have been bug fixes since then and currently, we have homa-v1.1. The changes between the two versions are displayed here. The changes are mainly related to the retransmission logic of the protocol.

I don't expect the retransmission logic to affect the performance in the reproduction simulation because the original Homa paper was evaluated without a retransmission logic in the OMNet++ simulator. To create the same effect, one would need to disable the retransmission logic of Homa in NS3 or use very large timeout limits. Could you share the command you used to reproduce the figure?

Regards,

joft-mle commented 4 months ago

Hi @serhatarslan-hub ,

thank you for your reply.

Regarding the command(s) to produce the figure I posted:

  1. $ outputs/homa-paper-reproduction/run-parallel-sim.sh 4 0.3
    1. By default run-parallel-sim.sh uses --disableRtx, so I assume that retransmissions are actually disabled by effectively using those huge timeout values. Other than that, just the executable name needed adjustment (scratch/HomaL4Protocol-paper-reproduction).
    2. The above produced 2x 4 .tr files; one set for load 50% and one set for load 80%.
    3. One difference in my command line is, that you must have used a duration (second argument) of 0.5 seconds. I don't know if that can explain such a substantial difference?
  2. Then I ran MsgTraces-SlowdownAnalysis.ipynb to let it generate the posted figure MsgComletionSlowdown_W5_load-80p.png. I just commented the "pfabric" plotting, since the initial goal is to compare with your results. Also, I changed the captions to say "Pxx" instead of "xx%".

I also repeated the above with a modified run-parallel-sim.sh: not including the --disableRtx argument. So that should then mean, that timeouts for retransmissions are NOT set to large values - and can actually be carried out. This time I realized my previous difference in duration and used 0.5 seconds. Not surprisingly (?) the slowdown plot for 80% load, with retransmissions not disabled, looks rather similar to the one I posted initially (with retransmissions disabled):

MsgComletionSlowdown_W5_load-80p

The fact that there is no big difference between my 2 plots, would kind of match the exception of yours, that actually doing retransmission should not affect performance, right? I think, I never saw the .ipynb code report any "Number of uncompleted messages" greater than 0. So no reason for retransmissions, anyway? Hmmm, no, I think this statement/question makes no sense, since, if retransmissions do occur, successfully, then an affected request would have been completed - successfully.

I guess, to be really sure, I'll re-run a simulation w/ --disableRtx (unmodified run-parallel-sim.sh) and 0.5 seconds of duration.

However, at least regarding the beginning of any of these simulations, I got the impression that there is no difference when comparing the .tr files of 2 runs of the same type of simulation - that there is no real random component involved, which would cause a different sequence of events - at least of course given the same seed values (via --simIdx=). I'm not saying that there has to, but I'm aware of this being possible, even when using the same seed - depends on the model.

serhatarslan-hub commented 4 months ago

Yes it looks like the only difference between your simulations and the presented results in the repo is the simulation durations. Unfortunately, I don't remember the durations I used, but 0.5 seconds per simulation sounds about right. Note that the MsgTraces-SlowdownAnalysis.ipynb has a variable called saturationTime which ignores messages that started before this time. This is used to measure the performance of the messages that were active after Homa stabilized in the network. The default value for this is 0.1 seconds (the simulation starts at t=3 and all the messages started before 3.1 are ignored for performance measurements). Since your simulations are 0.3 seconds long, you are only considering 0.2 seconds worth of traffic in your simulations. This might be a factor for the difference.

Please let me know if things change when you run homa-v1.0 with 4 parallel simulations that are 0.5 seconds each (RTX disabled).

joft-mle commented 4 months ago

Hi @serhatarslan-hub,

the 4 simulations in parallel, using tag homa-v1.0 (commit 5308948) for network load 80% with --disableRtx for 0.5 seconds resulted in the following slowdown plot:

MsgComletionSlowdown_W5_load-80p

To me, the results "optically" look rather identical to the resulting slowdown plot from the 4 simulations in parallel using commit 3cac711 (essentially homa-v1.1) for network load 80% with --disableRtx for (now also) 0.5 seconds:

MsgComletionSlowdown_W5_load-80p

And doing an md5sum on the .tr files indeed says, that the sequence of events is 100% completely identical:

df97b493e816292853261aa3d587df1d  my-4-parallel-sims-noRtx_g3cac7119bfdc/MsgTraces_W5_load-80p_0.tr
318cb9eed7aec00826fa863d4d004910  my-4-parallel-sims-noRtx_g3cac7119bfdc/MsgTraces_W5_load-80p_1.tr
e6500a28570fecb403d0fe74902cd152  my-4-parallel-sims-noRtx_g3cac7119bfdc/MsgTraces_W5_load-80p_2.tr
204c8ed94255cb912fbcbd2578fe10d5  my-4-parallel-sims-noRtx_g3cac7119bfdc/MsgTraces_W5_load-80p_3.tr
df97b493e816292853261aa3d587df1d  my-4-parallel-sims-noRtx_homa-v1.0/MsgTraces_W5_load-80p_0.tr
318cb9eed7aec00826fa863d4d004910  my-4-parallel-sims-noRtx_homa-v1.0/MsgTraces_W5_load-80p_1.tr
e6500a28570fecb403d0fe74902cd152  my-4-parallel-sims-noRtx_homa-v1.0/MsgTraces_W5_load-80p_2.tr
204c8ed94255cb912fbcbd2578fe10d5  my-4-parallel-sims-noRtx_homa-v1.0/MsgTraces_W5_load-80p_3.tr

I also checked the .tr files and plots for 50% load ... same story there - homa-v1.1 and homa-v1.0 results are identical - for a simulated duration of 0.5 seconds.

I don't know what I am missing.

serhatarslan-hub commented 4 months ago

I think I figured out the issue here.

Take a look at the figure for the total number of active messages throughout the simulation I run to obtain the results above.

The x-axis is the time. Note that the experiment continues until 3.5 seconds (starts at t=3 seconds). The messages/flows complete after that. The same figure reveals that the number of active messages does not completely saturate in 3.5 seconds, so I would recommend simulating longer, i.e., ~3-5 seconds, to measure the saturated performance.

Let me know if this helps.

joft-mle commented 4 months ago

Hi @serhatarslan-hub ,

for reference and completeness, here are the two graphs generated by MsgTraces-ActiveMsgCntAnalysis.ipynb from my 4 simulations in parallel using commit 5308948 (homa-v1.0) for network load 80% with --disableRtx and for a --duration of 0.5 seconds (title modified not include network load in percent):

TotNActiveMsgs_W5_load-80p

RecvPendingMsgStats_W5_load-80p

As can be seen the saturation behavior and behavior in general is also different, but not "extremely" different - compared with the graph, you mentioned.

I agree, that saturation is not reached after 0.1 seconds of MsgGeneratorApp action - neither nor in the graph, you mentioned nor in mine. And I agree that it thus makes sense to try and run the simulation for a longer period of time.

However I still do not understand how this all can explain the clear difference between your slowdown graph and mine, given the similarity of the TotNActiveMsgs graphs, same --duration (0.5 seconds) and assuming the graphs in the repository come from the same raw data, resulting from executing the same code?

Do you think, there is a chance that the included MsgComletionSlowdown graph has resulted from different raw data, than the TotNActiveMsgs graph? The .png metadata just suggests that they have been generated using the same matplotlib version.

serhatarslan-hub commented 4 months ago

I understand your concern. Unfortunately, I cannot quite remember the exact simulation duration I used to generate the slowdown figures. Yes, the commit names and default scripts suggest 0.5 seconds. However, we cannot find another reason for the difference. I vaguely remember a discussion I had with my teammates about how duration can change the simulation results. In fact, this was why I created the "number of active messages" figure in the first place.

marvin71 commented 1 month ago

Hi @joft-mle and @serhatarslan-hub,

I also ran into the same problem about a month ago, but I saw this issue only now. I was actually able to figure out the problem and reproduce the graphs included in the repository.

The problem is how the priorities of incoming packets are retrieved in the homa queue-disc.

https://github.com/serhatarslan-hub/HomaL4Protocol-ns-3/blob/3cac7119bfdca8313a337582e5d9a293dd6b4e1a/src/traffic-control/model/pfifo-homa-queue-disc.cc#L85-L92

The current implementation looks for the SocketIpTosTag which should contain the priority of the message, set by the homa protocol. The problem is, that no packet contains this tag when it arrives at the queue-disc. When the packet is send, it goes through the ipv4 stack and the Send function of the Ipv4L3Protocol removes the SocketIpTosTag (if there is one) and sets the TOS field in the ipv4 header accordingly.

https://github.com/serhatarslan-hub/HomaL4Protocol-ns-3/blob/3cac7119bfdca8313a337582e5d9a293dd6b4e1a/src/internet/model/ipv4-l3-protocol.cc#L774-L780

To fix this we have to use the TOS field of the ipv4 header instead of the SocketIpTosTag in the homa queue-disc:

uint8_t priority = 0;
auto ipv4_item = DynamicCast<Ipv4QueueDiscItem>(item);
if (ipv4_item)
{
    priority = ipv4_item->GetHeader().GetTos();
}

I hope this helps.

serhatarslan-hub commented 1 month ago

Thank you @marvin71 for sharing your fix. It is certainly great to see a community around this project. Would you be willing to send a Pull Request for your fix?

marvin71 commented 1 month ago

Yes, I just opened a Pull Request (#8) containing the fix.

Maybe for context, I am currently trying to explore the Homa transport protocol using SimBricks. In particular, I am looking into comparing the behavior for the ns-3 Homa implementation to the Linux implementation in SimBricks.

serhatarslan-hub commented 1 month ago

Thank you @marvin71 once again. I have merged the PR. Also, it is exciting that you are comparing the behaviors of ns-3 and Linux implementations. Please feel free to share your findings with us.

@joft-mle would you be able to re-run your experiments and see if the issue has been solved? I believe we can close this GitHub issue when you confirm.

joft-mle commented 1 month ago

@joft-mle would you be able to re-run your experiments and see if the issue has been solved? I believe we can close this GitHub issue when you confirm.

Indeed, after applying the change @marvin71 provided, a first re-run of the default experiment (effective duration of 0.5, assumption of 0.1 seconds of saturation, with --disableRtx, 4 independent runs in parallel) resulted in almost identical graphs, compared to what's in the repository.

For reference (my usual, slightly modified (no "pFabric" numbers)) graphs for 80% load:

MsgComletionSlowdown_W5_load-80p

TotNActiveMsgs_W5_load-80p

RecvPendingMsgStats_W5_load-80p

The same is true for the other case, 50% load.

Thank you very much, @marvin71 for debugging and @serhatarslan-hub for merging!