Abstract of the paper - Githubissues

celiasmith commented 8 years ago

Hi, getting the abstract agreed on will really help me in editing the rest of the paper, so I just want to see if this rewrite has made correct assumptions:

ABSTRACT 3 Evaluating the effectiveness and performance of neuromorphic hardware is difficult. It is even 4 more difficult when the task of interest is an embodied task; that is, a task where the output 5 from the neuromorphic hardware affects some environment that then determines the hardware's future input. However, 6 embodied situations are one of the primary potential uses of neuromorphic hardware. To address 7 this, we present a method for generating embodied benchmarks that uses a hybrid of real 8 physical embodiment and a type of “minimal” simulation. Minimal simulation has been shown 9 to lead to robust real-world performance, while still maintaining the practical advantages of 10 simulation, such as making it easy for the same benchmark to be used by many researchers. 11 The method is flexible because it allows researchers to explicitly modify the benchmarks 12 to identify particular task domains where specific hardware excels. To demonstrate the method, 13 we present a set of novel benchmarks that focus on motor control for an arbitrary unknown 14 system.

Seanny123 commented 8 years ago

That looks right to me, with the only difference being a small tweak to the last sentence due to something @studywolf suggested:

Evaluating the effectiveness and performance of neuromorphic hardware is
difficult.  It is even more difficult when the task of interest is an
embodied task; that is, a task where the output from the neuromorphic
hardware affects its future input through some environment.  However, embodied situations
are one of the primary potential uses of neuromorphic hardware.  To address
this, we present a methodology for embodied benchmarking that makes use
of a hybrid of real physical embodiment and a type of ``minimal'' simulation.
Minimal simulation has been shown to lead to robust real-world performance,
while still maintaining the practical advantages of simulation, such as
making it easy for the same benchmark to be used by many researchers.
These benchmarks are flexible, in that they allow researchers to explicitly
modify the benchmark to identify particular task domains where particular
hardware excels.  To demonstrate the method, we present a novel benchmark
where the task is to perform motor control on an arbitrary system with
unknown external forces.

Seanny123 commented 8 years ago

Oops, that should be from @tcstewar, not @Seanny123 I'm using a different computer and didn't notice he'd left himself logged in.

celiasmith commented 8 years ago

so i think you copied from the original, not my suggested changed version... but presumably this would be ok (new last line incorporated)? :

ABSTRACT 3 Evaluating the effectiveness and performance of neuromorphic hardware is difficult. It is even 4 more difficult when the task of interest is an embodied task; that is, a task where the output 5 from the neuromorphic hardware affects some environment that then determines the hardware's future input. However, 6 embodied situations are one of the primary potential uses of neuromorphic hardware. To address 7 this, we present a method for generating embodied benchmarks that uses a hybrid of real 8 physical embodiment and a type of “minimal” simulation. Minimal simulation has been shown 9 to lead to robust real-world performance, while still maintaining the practical advantages of 10 simulation, such as making it easy for the same benchmark to be used by many researchers. 11 The method is flexible because it allows researchers to explicitly modify the benchmarks 12 to identify particular task domains where specific hardware excels. To demonstrate the method, 13 we present a set of novel benchmarks that focus on motor control for an arbitrary system with unknown external forces.

tcstewar commented 8 years ago

Ah, sorry... I couldn't find the particular wording changes you were making. (it's hard to do a visual diff manually!)

All the changes I can see I'm happy with, although I find "a task where the output from the neuromorphic hardware affects some environment that then determines the hardware's future input" to be a bit more awkward than "a task where the output from the neuromorphic hardware affects its future input through some environment". But if you feel the first version does a better job of highlighting the environment aspect, I'm good with it.

celiasmith commented 8 years ago

fair enough. the main changes were at the end talking about the method (instead of benchmarks) being flexible, and presenting a 'set of' benchmarks rather than a single one.

The rewrite on the definition is to get rid of potentially ambiguous indexicals 'its' could be the hardware or the task... i tried some other rewrites, and it was hard to get rid of that problem... so while this is a bit lengthy, i thought it clearest.

tcstewar commented 8 years ago

fair enough. the main changes were at the end talking about the method (instead of benchmarks) being flexible,

Ah, I like that.

and presenting a 'set of' benchmarks rather than a single one.

Hmm, that I don't get. What are you thinking of for this set of benchmarks? In my head there's only one benchmark: performance on the minimal simulation. I suppose one could think of that as an amalgamation of an infinite set of benchmarks... is that what you were thinking of?

tcstewar commented 8 years ago

The rewrite on the definition is to get rid of potentially ambiguous indexicals 'its' could be the hardware or the task... i tried some other rewrites, and it was hard to get rid of that problem... so while this is a bit lengthy, i thought it clearest.

Good point. Yup, I'm happy with your rewrite of that. :)

celiasmith commented 8 years ago

Hmm, that I don't get. What are you thinking of for this set of benchmarks? In my head there's only one benchmark: performance on the minimal simulation. I suppose one could think of that as an amalgamation of an infinite set of benchmarks... is that what you were thinking of?

So my 'prototype' for benchmarking is what goes on in ML. There they have a specific task (e.g. categorization of this data set) and a measure (error). If you change the dataset, it's considered another benchmark (even though you're still doing categorization). So the mapping I'm thinking of is that the 'control of arbitrary robots' is like 'categorization', and each specific parameter variation (e.g. delay, sensor noise, etc.) is like different data sets (rmse is always the measure). So when you're talking about people showing how their hardware excels, they would be showing the set of benchmarks that they do best on.

tcstewar commented 8 years ago

Hmm... I think what you're saying makes sense. My worry with that terminology, though, is that this is now an infinite set of benchmarks, which is a bit unexpected. And you only ever run your hardware once on a particular benchmark in that set. And when I'm comparing two different models, they've never been run on the same benchmarks -- they've been run on a random sampling of benchmarks inside that set.

So all of that makes me uncomfortable with calling this a set of benchmarks, as I don't think it's what people would imagine. They're more used to thinking of maybe there's 10 benchmarks, and I've run my hardware against all 10. So I think I'd prefer more to call the whole set one benchmark.

celiasmith commented 8 years ago

Actually this 'infinity' problem already exists in ML... when they use benchmarks with noise. The way you deal with it is to specify the distribution you pick from. That seems pretty natural in this case too.

tcstewar commented 8 years ago

Actually this 'infinity' problem already exists in ML... when they use benchmarks with noise. The way you deal with it is to specify the distribution you pick from. That seems pretty natural in this case too.

But do they call each draw from that infinite sample a different benchmark? Or do they call it one benchmark with noise?

celiasmith commented 8 years ago

So a benchmark would be a data set, with a distribution for how to distort the data. Just like a benchmark here would be a parameter (e.g. delay) with a distribution.

celiasmith commented 8 years ago

All the other distributions would remain constant and be averaged over.

tcstewar commented 8 years ago

Hmm.. I think there's two different things being talked about here.

Yes, I think I'd consider the use of different distributions as giving you different benchmarks. For example, the basic benchmark might look like this:

delay: uniform(0, 0.01) filter: uniform(0, 0.01) noise: uniform(0. 0.1) greek letters: N(0, 1)

That's one fixed distribution and I could measure rmse on that and get an answer. Although I probably would want to report the distribution of rmse values, rather than an average.

Then, if people want to characterize their hardware more, they can do things like change these ranges and see what happens. That's what I did in the vary_delay plot: I varied the delay in (0, 0.04) and plotted rmse against delay. I could see calling that a separate benchmark, I think.

That's the sort of thing that I was meaning by that "flexibility" comment in the abstract. You can find out what things your hardware is good at by exploring the distribution space a bit.

tcstewar commented 8 years ago

So I'd be happy saying there's one main benchmark (the standard distributions) and a few auxiliary benchmarks (each of the fun plots, like "varying delay", or "varying n_neurons and D").

tcstewar commented 8 years ago

(re-reading your initial question, it looks like I was misinterpretting you for most of the above conversation. I take it you were never proposing that each particular random sample is a different benchmark? Rather, you were saying that you can define a benchmark by selecting different distributions of parameter values? So we might have a set of 5 or so different benchmarks in this paper? If so, I'm completely onboard with that definition.)

celiasmith commented 8 years ago

yeah, that's what i was after... so the set would be the 5 benchmarks... cool! so i think the changes i sent a while back are consistent with this characterization (at least that's what i was trying for). i'll send the last sections in a bit... just back to them now.

tcstewar commented 8 years ago

Excellent! :)

I'm off for dinner now, but I posted my plan for the analysis section, given this discussion:

https://github.com/tcstewar/2015-Embodied_Benchmarks/issues/26

tcstewar / 2015-Embodied_Benchmarks

Abstract of the paper #24