Open kbala444 opened 9 years ago
@heems could we get an 'observed bandwidth' per file transfer field in the output?
Other than that, this is all looking really nice! Are the results reproducible? If so, what is the margin of error between runs? (if not, uh oh :( )
could we get an 'observed bandwidth' per file transfer field in the output?
Sure, outgoing bandwidth should be easy, but would require changes to mocknet as well (maybe adding an outgoing bandwidth field to conn
in mock_conn.go
). Would getting the total bytes sent of a peer over the file transfer time be an accurate outgoing bandwidth? And would this be like an outgoing bandwidth estimation for every active peer for each file transfer?
Other than that, this is all looking really nice! Are the results reproducible? If so, what is the margin of error between runs? (if not, uh oh :( )
Thanks! I'm not completely sure, but I think that if latency is 0 and bandwidth is uncapped, the block time speeds would depend on the speed of your computer. However, if the CPU isn't the bottleneck, the results are reproducible, and even if the CPU is the bottleneck, running the same workloads on the same CPU will give similar results.
I recorded some statistics about bssim (xzibit.jpg) where each workload was run 10ish times. The samples/star workload was run with no latency or bandwidth cap and the samples/starwithconns was run with the existing link settings.
I'm not sure what the acceptable variance for duplicate blocks are in a less trivial workload like samples/viral, but here are the stats for that (ran with latency: 3 and bandwidth: 100).
Sorry for the late reply btw, been sick the last couple of days.
@heems this is good stuff! The high variance on duplicate blocks is pretty much what i expected, the lowish variance on times is nice though!
How are you measuring the variance on block duplicates? is this across all nodes on all runs? or is it across the same node over a series of runs? The number of duplicate blocks received is going to depend on how many other nodes in the network receive blocks first, so i'm not sure how best to measure that.
At any rate, I like what I see, we're getting really close to the point where we can just start tweaking bitswap and seeing how it reacts :)
next steps:
How are you measuring the variance on block duplicates? is this across all nodes on all runs? or is it across the same node over a series of runs?
It's across all nodes on all runs, but it should be easy to get it across the same node too if you want that statistic.
make sure numbers being reported look right (and that they really mean what we think they mean)
I'll add more bssim unit tests for sure, but how else would you suggest verifying the numbers?
automate bssim stuff, maybe in CI?
CI == continuous integration? Could you elaborate on this? Sorry, I'm not really sure what you mean.
Yay can't wait to work on bitswap
@whyrusleeping, @jbenet
As requested, here is a guide to using bssim and some questions I still have about it.
Example Use
Let's say I want to see how
samples/star
workload is affected by different latencies.The first line of a workload file is the config line. In the star workload, there are 10 nodes (
node_count:10
). By default, all the nodes are linked to every other node, and thelatency: 5
option sets every link in the network to have a latency of 5ms.You can also set the options of each link yourself.
The first step is to edit
data/config.ini
. I want to run the workload with latencies at 0, 10, 40, and 70ms where the bandwidth is a 1000mbps, so I edit the config file to look like this:The latency and bandwidth lists will overwrite the latency and bandwidth fields I specified in the config line of the star workload. I also configured it to show graphs after the script finishes and save them to graphs.pdf.
I then run
./scripts/latbw.sh samples/star
. This will run the star workload with the latencies and bandwidths fromdata/config.ini
.At the end of reach, some basic information about the run is printed:
(sorry, I forgot to disable f.lux when taking these screenshots)
These were the graphs generated at the end of the script: Here is what I wanted, it takes the mean block time for each for all recorded
samples/star
runs and graphs it against the latency of each run:This is one of block times for the latest run over time:
And one for file completions over time for the latest run:
A full lists of commands and options are in the readme.
The script that makes the graphs is
data/grapher.py
, so if you want to review the graphs from a workload you ran earlier, you can just runpython grapher.py
by itself.Manual links
It's not very realistic to have all nodes in the network have the same latencies and bandwidths, so you can also manually set the link options between two nodes with the
[connecting_nodes]->dest_node
syntax. This gets super tedious if you have more than like 5 nodes, however, so there's also this tool in the repo calledlatgen.py
.latgen.py
generates nodes with random locations in the U.S + England and ouputs thesenode->node
lines with realistic latencies and not very realistic yet bandwidths.If I wanted to make a star workload using latgen, I could:
This will create a new workload,
samples/starwithconns
, that does the same stuff as star, except with a bunch of realistic connections.-i
flag means insert into workload, which will keep your existing workload except use the manual connections from latgen.-b
flag sets the average outgoing bandwidth for a node in the network-t
flag sets the topology of the network, I only have fully meshed and star right now though-l
flag labels the edges in the graph (which represent links between nodes) with "l, bw" where l and bw are the latency and bandwidth of that link.Here is the graph generated from latgen:
And the new starwithconns file:
If you only want to run the workload once with the latency and bandwidth specified in it's config line, you can run
./scripts/singlerun.sh samples/starwithconns
. If you don't care about graphs and just want the end-of-run stats, you can just run./bssim -wl samples/starwithconns
and then later runpython data/grapher.py
if you change your mind.Questions/limitations
Right now, bandwidth is specified per link in mocknet. Each link has it's own bandwidth cap, and a node has a link to each node it's connected to. Shouldn't the total outgoing bandwidth from a node be capped and not the bandwidth per link?
Also general mocknet question, if I want two nodes to be able to send/receive messages to each other, do I need to call both
LinkPeers(n1, n2)
andLinkPeers(n2, n1)
or just one of them?Any requests/simplifications/questions?