How do we define an evaluation setup?

simonduq commented 6 years ago

In the current data schema, runs refer to a setup, which in turns consists in a profile, testbed, protocol and configuration. Needs discussing more.

romain-jacob commented 6 years ago

I think we should convert 'testbed' to 'test environment'. In my opinion, we should open up the benchmark to be run also on simulators, or even ad-hoc networks ('on-the-desk' experiments, or non-open testbeds).

simonduq commented 6 years ago

Agree. Although I'd like to come up with a better name, because "environment/setup/configuration" might be a bit unclear.

romain-jacob commented 6 years ago

Definitely. I'll try to think about this and make a proposal

romain-jacob commented 6 years ago

Okay, here is a proposal to feed the discussion. I suggest to adapt the data scheme we currently have in the repo as follows.

There are 4 categories of things:

Profiles
- Inputs
- Output metrics
- Observed metrics
Protocols, containing only optional fields like textual description, links to source code, papers, documentation, etc.
Experimental setup
- Platform: TmoteSky, Firefly, etc.
- Radio chip: cc2420, cc2838, SX1261, etc.
- Environment: Testbed, Simulator, or Ad-hoc
  - if Environment == Testbed, testbed_id
  - if Environment == Simulator, simulator_id
Results
- profile_id
- protocol_id
- setup_id
- set of runs, with for each
  - timestamp
  - all output and observed metrics for the profile
  - Optional: any additional metric you want to report for your runs

Plus optional information relevant to report with your results, for example

Testbed configuration file (e.g., Flocklab XML template)
Complete list of node IDs (only relevant for testbeds)
Simulation file, if you report results from Cooja or Renode or ...
Source code repo URL with the commit ID from your test
Protocol configuration. Can be textual, with a free label (e.g., high bandwidth) or by reporting the values of key parameters (e.g., LWB_ROUND_PERIOD = 1)
... anything else relevant

From there, the main open challenge remaining (in my opinion) is the definition of the profiles, and in particular how to define the output metrics: should we specify the mesurand/metric (e.g., average energy consumption) or the physical phenomenon (e.g., power draw per node).
See the discussion there: https://github.com/iot-benchmark/iot-benchmark.github.io/issues/4

simonduq commented 6 years ago

Looks good, only thing I'm unsure about is if we want so many different config fields for each run, as opposed to adding one more structure that describes a configuration (node IDs used, protocol config etc.). And maybe even this goes into "experimental setup"?

If runs could simply point to a config, we'll be able to easily look at all iterations of any particular config.

romain-jacob commented 6 years ago

I think I see what you mean: nodeIDs used, protocol config etc. are field of a "result", not of a specific "run". Correct?

If so, that's also what I had in mind, but I realize the layout was a bit confusing. I would not put it in the 'experimental setup' though, because a setup here is rather general (aka 'not made by the user') while the 'results' contains everything which is 'produced' by the user.

If not.. then I did not get your point.

simonduq commented 6 years ago

That's what I meant yes, and I realize only now that's also what you proposed initially. I was confused, thinking your last bullet list was a per-run thing, but it's part of "result".

simonduq commented 6 years ago

So in terms of modification of the current data schema, it's mostly:

"testbed" becomes "experimental setup". Now covers the case of simulation or tabletop xp.
"setup" renamed to "results" (but a side-effect is that all results become a set, as a field of "result", might be a lot less flexible than having one separate element per run, each pointing to something that describes the full configuration)

romain-jacob commented 6 years ago

That would be my proposal.

a side-effect is that all results become a set, as a field of "result", might be a lot less flexible than having one separate element per run, each pointing to something that describes the full configuration

Both are not exclusive I think. I would be in favor of keeping a 'run' structure, similar to what you proposed (containing essentially the metrics, a timestamp and the 'result_id'). Then, the 'result' structure would contain a list of 'runs': run1, run2, runx... all related to the same profile_id, protocol_id and setup_id.

simonduq commented 6 years ago

OK, that works for me. Let's update the data schema and jekyll implem. Feel free to start with the schema on the wiki :)

chanmc commented 6 years ago

We can have fields for both per-node and end-to-end fields and let the users choose to fill in the fields depending on the environment they run the experiment in.

romain-jacob commented 6 years ago

We have converged on something that seems to make sense for now. Let's give it a try and start contributing results. We will see if we need to adapt the structure.

romain-jacob commented 6 years ago

So, based on the last call, we should try to find something else than 'result'. I don't remember everything that was said... What about this:

Rename: 'Experimental setup' -> 'Setup' 'Result'-> 'Experiment'

That would mean an 'Experiment' becomes the association of

a profile
a protocol (generic, eg LWB, not 'LWB with 2s period')
a setup (platform, testbed ID, etc)
the results, which is essentially a set of runs, i.e. the output and observed metric values

With this, we still have the problem that was mentioning Om: where do I put the experiment specific things ?

key parameters (e.g., LWB_ROUND_PERIOD = 1), testbed configuration file (e.g., Flocklab XML template), complete list of node IDs (only relevant for testbeds) etc.

I would be against having it in the 'Setup', as it would lead to have essentially one 'setup' per 'experiment'. I would prefer keeping a limited amount of setups. Ideally, I think it would be good that uploading results to the repo requires as few files as possible. If we rename 'Result'-> 'Experiment', it would make sense to have the protocol parameters etc. in there. or?

simonduq commented 6 years ago

My proposal was:

Rename "experimental setup" -> "environment"
- Describes testbed/simulation
Rename "result" -> "setup"
- Links profile, protocol, environment, and adds configuration for each (LWB round, Flocklab XML template etc.)
- Does not explicitly contain "runs"
Each individual run points to one setup, i.e., to a complete description of what was used in the run

romain-jacob commented 6 years ago

Okay, that makes sense. Let's go with your proposal. I will update the data schema accordingly asap (hopefully today).

simonduq commented 6 years ago

Thanks! Please give a heads up when the schema is updated :)

chanmc commented 6 years ago

A clarification, so a specification or setup includes {profile, protocols, environment}. For each setup, we can many runs, each run has additional values for output and observed metrics. Make sense to me.

romain-jacob commented 6 years ago

Done

iot-benchmark / iot-benchmark.github.io

How do we define an evaluation setup? #2