What causes the orange lines to be missing from some or all of the plots?

higher-order commented 6 months ago

I followed the instructions on README and used tornettools to generate a 1% network. Over the past few months, I ran some simulations under those different (separate) conditions:

The original, unmodified condition.
A loss of 0.011 is added to all the edges in the network graph.
The v3bw file (the bandwidth file used by the directory authorities for consensus) is changed to assign the same value to the bw field of each relay line.

Both the simulation with the additional loss and the simulation with the changed v3bw should cause a significant decrease of the network's performance. The degradation is confirmed after manually adding iperf processes to some of the Shadow hosts in the case of loss, and manually diff'ing the tgen stdout of clients in the case of changed v3bw.

However, for many times after I ran tornettools parse ... then tornettools plot ... to generate plots for the different simulations, I saw no difference on the tornet.plot.pages.pdf from the original simulation vs the loss simulation, or the original simulation vs the v3bw simulation. Visually, the plots were completely identical when I switch between the browser tabs of those PDFs.

Only at some recent runs, I noticed some orange lines are added after I tried to plot. I then realized that the plots I saw previously only had blue lines. They are labeled as "Public Tor" and might be just some reference plots. The orange lines that started to appear recently were labeled tornet-0.01. And they confirmed the performance degradation.

My workflow has always been:

Having run stage and generate once in the very beginning to generate the 1% network.
Then simulate, parse, plot.

Followed by a custom cleaning script:

#/bin/bash
if [ -z "$1" ]; then
echo "Please provide the tornet directory to clean."
exit
fi
pushd "$1"
rm dstat.log free.log shadow.log tornettools.simulate.*.log
rm reports.txt new.v3bw
rm -rf shadow.data/
popd

Then simulate, parse, plot for the next run, but not stage or generate again.

My questions are:

What causes the orange lines to appear or not appear?
- I don't remember what I did differently recently. But I noticed some tgentools.parse.<time>.logs, in addition to the tornettools.parse.<time>.log. They seem to appear recently, but I can't remember for sure.
Right now the orange lines only appear in the client goodput plots, and they are missing from all other plots, e.g. relay goodput. What is the reason and how to fix it?

A PDF of a simulation under the original condition is attached. Note: In many previous runs, there are no orange lines at all.

Thank you. tornet.plot.pages.pdf

jtracey commented 6 months ago

The different lines are conditional on

the name of the folder with data under that name was provided as an argument, and
the already parsed data being plotted is present somewhere in that folder.

Assuming you want scientifically valid results, you'll want to also have multiple samples to get error bars on your results (though 1% networks will likely be too noisy to draw conclusions, see the paper for details on why that is, especially the parts discussing Fig. 7a and 8a). So you should set up with a directory structure like this, with all your experiment results:

├-lossless networks
|  ├-tornet-0.01-0
|  ├-tornet-0.01-1
|  ├-tornet-0.01-2
|  └-tornet-0.01-...
└-lossy networks
   ├-tornet-0.01-0
   ├-tornet-0.01-1
   ├-tornet-0.01-2
   ├-tornet-0.01-...

You then run the parse script on each of the tornet directories. After all the directories are parsed, you then plot both configurations with tornettools plot 'lossless networks' 'lossy networks' from the dir containing those two dirs. (You can of course also compare more than two configurations, e.g., various packet loss rates, generating lines of various colors.)

If that's what you're doing, and you're still only seeing one line in some of the plots, that means some of the parsed data is missing. Check the output of the parse tool to see which files it's failing to find or parse, then use that information to figure out why it failed (e.g., did some of the logs get deleted? did the perf clients fail to download a consensus, or start at all?).

jtracey commented 6 months ago

@cohosh also recently posted how she ran experiments, if you want an example of start to finish. Her setup will be a little different from yours because she's testing Snowflake in particular (e.g., I doubt you need the --model-unblocked-syscall-latency=true argument added to your experiments), but it should give a good idea of all the steps involved.

higher-order commented 6 months ago

@jtracey Okay, I restarted from scratch and created a directory called tornet-original/ for unmodified networks. I created a sub-dir 0.05/ for 5% networks. I then generated 3 such networks into 3 sub-dirs 1/, 2/, 3/. The only thing I modified is changing stop_time from 3600 to 1200. I then simulated and parsed the 3 networks individually, and ran tornettools plot on the tornet-original/0.05/ directory.

I saw that this time there are error bars on the plots. However, still only the two plots titled "exit Client Transfer Goodput (Mbit/s)" have the orange lines for my simulations. The other plots only have blue lines for "Public Tor". The "exit Circuit Round Trip Time" plot and the "exit Transfer Error Rate (%)" plot have the orange label, but no orange line.

In tornettools.parse.<time>.log, I saw a warning:

[INFO] Parsing oniontrace logs.
[INFO] Parsing oniontrace log data with oniontracetools now...
[INFO] oniontracetools returned code 0
[INFO] Extracting oniontrace plot data.
[WARNING] Unable to find oniontrace analysis data at /home/user/projects/tor-crowd-sourced-measurement-sims/tornet-original/0.05/3/oniontrace.analysis.json.xz.

I then read oniontracetools.parse.<time>.log:

2024-05-28 02:12:25 1716862345.445774 [oniontracetools] [WARNING] No valid oniontrace files found at path /home/user/projects/tor-crowd-sourced-measurement-sims/tornet-original/0.05/3/shadow.data/hosts, nothing will be analyzed

After reading some source code and running oniontracetools manually, I modified this line from r'/oniontrace... to r'oniontrace.... If I run oniontracetools manually with that pattern, or use that pattern in tornettools, oniontracetools is able to work and generate the missing oniontrace.analysis.json.xz. However, tornettools seems to have problem afterwards:

2024-05-28 03:19:58 1716866398.511445 [tornettools] [INFO] Parsing oniontrace logs.
2024-05-28 03:19:58 1716866398.511810 [tornettools] [INFO] Parsing oniontrace log data with oniontracetools now...
2024-05-28 03:20:53 1716866453.686002 [tornettools] [INFO] oniontracetools returned code 0
2024-05-28 03:20:53 1716866453.686278 [tornettools] [INFO] Extracting oniontrace plot data.
Traceback (most recent call last):
  File "/home/user/venv/bin/tornettools", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "/home/user/tornettools/tornettools/tornettools", line 733, in <module>
    sys.exit(main())
  File "/home/user/tornettools/tornettools/tornettools", line 606, in main
    rv = args.func(args)
  File "/home/user/tornettools/tornettools/tornettools", line 637, in parse
    return parse.run(args)
  File "/home/user/tornettools/tornettools/parse.py", line 23, in run
    extract_oniontrace_plot_data(args)
  File "/home/user/tornettools/tornettools/parse_oniontrace.py", line 54, in extract_oniontrace_plot_data
    __extract_circuit_build_times(args, circuittype, data, startts, stopts)
  File "/home/user/tornettools/tornettools/parse_oniontrace.py", line 59, in __extract_circuit_build_times
    cbt = __get_perfclient_cbt(data, circuittype, startts, stopts)
  File "/home/user/tornettools/tornettools/parse_oniontrace.py", line 82, in __get_perfclient_cbt
    circ = data['data'][name]['oniontrace']['circuit']
KeyError: 'circuit'

Do you have any idea on why this is the case? I've tried building tornettools and oniontracetools from source code at origin/main and the various tags too.

stevenengler commented 6 months ago

The only thing I modified is changing stop_time from 3600 to 1200.

I'm not super familiar with tornettools and I haven't been following along in this discussion, but just wanted to mention to be careful to also adjust the --converge-time argument to tornettools parse when adjusting the simulation stop_time.

--converge-time: The number of seconds after the beginning of the simulation that we should ignore in the tgen and oniontrace log files, i.e., so we don't track network performance before the network has reached steady-state. Log messages during the interval [0, converge_time) will be ignored.

The default for this argument is 1200 seconds, so if you're also setting the simulation end time to 1200 seconds, there won't be any data left to parse.

higher-order commented 6 months ago

That is a great catch @stevenengler. I started to run tornettools parse --converge-time 300 tornet-original/0.05/1 (and 0.05/2, 0.05/3) instead. If I remove the / in the code as described above, the KeyError is still thrown.

But if I revert the change and keep tornettools unmodified, it still log the same 2 warnings in tornettools and oniontracetools parse logs. But after then running tornettools plot tornet-original/0.05/ --tor_metrics_path tor_metrics_2023-04-01--2023-04-30.json --prefix pdfs, which is the same way as the README of tornettools repo, a lot more plots have orange lines now.

The only plots that miss orange lines are:

Sum of Relays' Goodput
exit Circuit Build Time

higher-order commented 1 month ago

Seems like pip-installing tgentools and oniontracetools should help tornettools plot most if not all of the lines. More emphasis on this part in the docs will be helpful though.

shadow / tornettools

What causes the orange lines to be missing from some or all of the plots? #107