shadow / tornettools

A tool to generate realistic private Tor network models, run them in Shadow, and analyze the results.
Other
34 stars 15 forks source link

Multi-dataset CDF+CI lines are drawn differently from single-dataset CDF lines #76

Closed stevenengler closed 2 years ago

stevenengler commented 2 years ago

Tornettools uses the functions draw_cdf() and draw_cdf_ci() to draw CDF plots depending on if the data contains multiple datasets (the latter draws confidence intervals and the former does not). These two functions will plot identical data differently. For example in the following graph, the orange and blue lines should have identical data points, but the blue line has multiple datasets while the orange line only has one. The orange line data points are connected by straight lines whereas the blue line data points are binned by quantile, causing it to look like a step function.

client_goodput_5MiB exit

Ideally lines showing multiple datasets should be drawn the same way as a single dataset, otherwise it looks like the results are different, even when the data is the same.

robgjansen commented 2 years ago

In draw_cdf(), I added the quantile function without much testing. I think it's what we want, but it seems to be giving us a linear interpolation of the data.

I think we maybe want to change the method argument to use a different interpolation method? https://numpy.org/doc/stable/reference/generated/numpy.quantile.html

It's also possible that there is a bug before arriving at the draw_cdf() function that is causing unexpected results.