Understanding the discrepancy between relay bandwidths in `shadow.config.yaml` and in `v3bw.init.consensus`

shadow / tornettools

A tool to generate realistic private Tor network models, run them in Shadow, and analyze the results.

Other

31 stars 15 forks source link

From my reading of the tornettools code, it seems like tornettools reads through a bunch of real historical consensuses among dirauths, gets the value of the w Bandwidth= line for each relay from each consensus, takes the median of that for each relay among all consensuses, and sets that as the weight of each relay in the relay staging file relayinfo_staging_2023-04-01--2023-04-30.json. It then normalizes the weight of relays in the staging file and writes that into shadow.data.template/hosts/bwauthority/v3bw.init.consensus.

It also reads through a bunch of relay descriptors and uses data from there for the bandwidth_capacity, bandwidth_rate, and bandwidth_burst of relays in the relay staging file. It also then sets the bandwidth_down and bandwidth_up of relays in shadow.config.yaml to the bandwidth_capacity in the staging file.

This means that the actual bandwidths of relays in the simulation are the self-reported bandwidths from relay descriptors in the real world, and the values in the v3bw are normalized from the bandwidths in the real historical consensuses, which are measured by the bandwidth scanner sbws.

I then ran an experiment, which has one simulation with the original config from tornettools generate, and another simulation where I read the bandwidth_down from the generated shadow.config.yaml (instead of from the relay descriptors), normalized them the same way tornettools did, and created a new v3bw.init.consensus based on that. I thought this would decrease the gap between relays' actual bandwidths in the simulation, and the v3bw in the simluation. However, the experiment showed that the performance dropped by a little, as a result.

I'd like to ask why is there such a gap between shadow.config.yaml and v3bw.init.consensus, as described above? And also why would my experiment show worse performance, after I tried to lessen the gap?

The experiment plot is attached. tornet.plot.pages.pdf

The main thing you should realize is that the w Bandwidth= line in the consensus files is NOT a bandwidth value, despite the confusing name. It is a relay selection weight that may or may not be correlated with the actual raw network bandwidth capacity of the relay. Yes, there are some sbws measurements that go into the process of computing the relays' w consensus weights, but sbws does not do a very good job of discovering the relays' true bandwidth capacities. I don't think it was ever designed for that purpose. It is designed to compute relative performance across different relays, and it uses the relative performance to compute weights.

So, the bandwidth capacity of a relay is best measured with a speed test, but alas Tor never adopted that approach. So tornettools relies on the next best thing currently available, the bandwidth history, specifically, the maximum sustained throughout over a 10s period a relay has observed. This is a sort of metric for what the relay can handle. Of course if the relay is never pushed to it's limit, this number will not be correct. Also, the relay could lie about it. But it's the best we have until a more secure speedtest measurement system is deployed, which does not seem to be a priority for Tor atm.

If you want to learn more about the bandwidth numbers, I highly recommend you read this paper, or at least listen to my recording of my talk where I presented the main results.

On the Accuracy of Tor Bandwidth Estimation https://www.robgjansen.com/publications/torbwest-pam2021.html

There are also several new approaches to actually measuring relay bandwidth capacities. In my biased opinion, the active measurement approach of FlashFlow is one of the best, which you can learn more about here: https://www.robgjansen.com/publications/flashflow-icdcs2021.html

shadow / tornettools

Understanding the discrepancy between relay bandwidths in `shadow.config.yaml` and in `v3bw.init.consensus` #111