tbarbette / npf

Network Performance Framework: easy-to-use experiment manager with automated testing, result collection, and graphing
GNU General Public License v3.0
38 stars 15 forks source link

Go through an intermediate CSV before doing graphs #26

Open tbarbette opened 2 years ago

tbarbette commented 2 years ago

So the idea would be to keep the cache idea as hidden as possible, and always export a CSV that will be used to create the graphs. The npf commands will continue to build graph automatically but a new npf-graph command would allow to rebuild the very same graph from the CSV.

The question left therefore would be what would be the appropriate CSV format. Knowing we have multiple output variables, and multiple runs per parameters, and also multiple series when using npf-compare.

Imagine we compare netperf and iperf, have one variable "ZEROCOPY" that can have values 0 and 1 and have two outputs results THROUGHPUT and LATENCY, and do 2 runs :

series,run_number,ZEROCOPY,THROUGHPUT,LATENCY iperf,1,0,... iperf,1,1,... iperf,2,0,... iperf,2,1,... netperf,1,0... netperf,1,1... netperf,2,0... netperf,2,1... The problem is still that some "output" (results) can have multiple values in the same run. We could use another (a bit non standard) separator to have multiple results in a single column, eg using the "+" sign (using a ";" might lead to bad interpretation of CSV).

Any input on this?

MassimoGirondi commented 2 years ago

What about storing all the intermediate data in a binary format? Pickle is the first that come in mind.

Not the most elegant solution but it could abstract from having to save each individual combination of parameters for each particular run and have to write your custom csv syntax. Then it's a matter of separating the testing and graphing parts, invoking each before or after the (de)serializer when you want to do the graphing or only export the results.

You can see it as a sort of "snapshot" of the results in that particular run.

tbarbette commented 2 years ago

In the first versions I used pickle actually. But as the format evolved I suffered from backward incompatible opening and had to re-execute tests. The advantage of a kind of CSV is that it's human readable. But yes, it begins to be complex with multiple results.

And I did not mention the problem of time series... How to store a dozen results over the duration of the experiment, at time intervals that are different for each experiment...

tbarbette commented 2 years ago

Maybe the CSV is still the best way to handle this, with just weird format for weird use cases (multiple-results per run). And maybe one CSV file per time series (again, time series are not necessary in all experiments).

MassimoGirondi commented 2 years ago

JSON? I'm not a huge fan for cases like this but it could be a good tradeoff, allowing basic human readability but allowing nesting and objects-like syntax...