Relationship between the physical part of the benchmark and the minimal simulation

tcstewar commented 9 years ago

So @celiasmith and @studywolf , I woke up and had a sudden flash of insight that might help clarify this connection.

The point of the physical portion is to calibrate the minimal simulation. If I measure the delay in the EV3, I get something around 0.005 total delay. So I set all the delays in the minimal simulation to random values between 0 and 0.01. The sensor noise for the robot is around 0.02 radians (1 degree, due to quantization), so I set all the noises to 0 to 0.1 (that's probably a bit aggressive, but whatever). The more interesting one is K_f, the magnitude of the external force. With the configuration of the robot arm as it is, it takes about 30% of peak motor power to compensate for gravity in the worst case (when the arm is straight out to the side). So I need to set T and K_f such that the magnitude of the externally applied force (K_f times wacky greek letters) is less than 0.3T. I can figure that out by randomly sampling the external force function.

I'm going to write that up more cleanly, but I kinda like this idea -- I'd sort of done it implicitly and it's rather nice to be explicit about it.

studywolf commented 9 years ago

Ahaha this is a paradigm shift for sure. You want to have a benchmark that is based on the real world that is simple and variable enough that if the hardware can adapt to this then it will work on your real world case. So you use the physical system to create your minimal simulation which you then test everything on and use that to choose the most appropriate hardware for your problem. And you don't have to worry about then going and testing each on the physical system after training because we're assuming that we've captured all relevant variability of the system, and that the hardware that works the best on the minimal simulation will work best on the actual problem.

Is that right? That would then lead to not needing the physical confirmation except from possibly one system to confirm that you've captured the major relevant features of the hardware?

tcstewar commented 9 years ago

Is that right? That would then lead to not needing the physical confirmation except from possibly one system to confirm that you've captured the major relevant features of the hardware?

Yup, I think that's the claim I want to make. If this all works well, then we don't need that physical confirmation. But the cautious part of me thinks that it's still good to do so, and it's useful for rhetorical purposes anyway. But I think the core point I want to make is that if you've got a good minimal simulation, then you don't need that real-world benchmarking. But don't think I want to make that extremely strong version of the claim until after doing a lot more confirmation (i.e. after this paper).

tcstewar commented 9 years ago

that the hardware that works the best on the minimal simulation will work best on the actual problem.

I'd also slightly disagree with this. I'm not sure that I need to assume that the software that works best on the minimal simulation will also work best on the actual problem. If I am in a situation where I have one actual problem that I want the system to be good for (say, controlling the lego robot in one particular configuration), then I should probably just use that robot as the benchmark to be on the safe side. The minimal simulation might have way more noise or other features that mean I really could do much better if I tweaked the hell out of the algorithms for that one particular case.

However, if I don't know what my actual problem is, I just want a system that's reliably good across a wide range of tasks, then I'm much better off going with what works best in the minimal simulation. And that, I'd argue, is what you want in a benchmark, since the whole point of the benchmark is to help you decide what hardware to use in whatever new situation you're in.

studywolf commented 9 years ago

ah, i was assuming that the benchmark you were using was created based on the physical system.

aaaand if it's not, then i don't understand the importance of using the physical system to calibrate the minimal simulation, really. maybe from the 'hey here's a bunch of loose approximations of robot arms, if the hardware does well on these then you're probably good' vs 'hey here's a bunch of simulations of wheeled robots...' to generally get a sense of the different kinds of variability involved in different tasks that you might want the hardware for...but if it's for just a general 'this is good in uncertain environments' why constrain the benchmarks physically or use plausible forms of external forces at all?

tcstewar commented 9 years ago

ah, i was assuming that the benchmark you were using was created based on the physical system.

It is, but I want the physical system to be the sort of thing that is covered by the minimal simulation, not that one exact particular thing. So I for the physical system, I happened to use an arm that is exactly 11 lego studs long. So any time I'm testing using the physical system, that's the particular physical system I'm testing on. But I don't really care about "controlling lego arms that are exactly 11 lego studs long". I want a benchmark that's more general than that.

but if it's for just a general 'this is good in uncertain environments' why constrain the benchmarks physically or use plausible forms of external forces at all?

Because you need some sort of context for what "uncertain environments" means. And that's what the calibration to some physical system provides. It's not "uncertain" in the sense of "the laws of physics might suddenly change". It's not uncertain in the sense that the environment just ignores your output and feeds you random input. So the minimal simulation is inspired by some physical situation, and calibrated such that a particular instance of that physical situation is somewhere in that space of uncertainty.

tcstewar commented 9 years ago

Here's my attempt at describing this calibration:

Now that we have this physical example of the task out minimal simulation
benchmark is meant to cover, we can use it to calibrate the parameters of
the simulation.  For example, communication with the EV3 happens around
200Hz, meaning that there must be a delay on the order of 0.005 seconds.  Given
this, we set the delays in the simulation to be uniformly chosen between
0 and 0.01.  Importantly, we do not need to exactly measure the delay in the
EV3 robot --- we just make sure that the minimal simulation is worse.

For sensor noise, we note that the EV3 rotation encoders for the motors (the
devices that measure $q$) have a resolution of 0.0175 (1 degree).  This is a
very different sort of noise than the gaussian noise used in the simulation,
so we set the simulation noise to be much larger (uniformly distributed
between 0 and 0.1).  Similarly, the motor resolution is 0.01, as it accepts 
integer values up to 100, so we set the motor noise to be uniform between
0 and 0.1.

Finally, we can use the physical system to calibrate the relationship between
$T$ (the maximum torque applied by the motor) and $K_f$ (the scaling factor
of the external force).  After all, we do not want external forces that are
so strong that the system does not have enough strength to counteract them.
On the physical robot, in the worst-case scenario ($q=\pi/2$ or $-\pi/2$),
the motors must be driven at around 0.3 times their maximum strength to
balance the force of gravity.  (This can, of course, be adjusted by changing
the weight and its position on the end of the arm).  If we arbitrarily fix $K_f$ to 1 and randomly
generate external forces given the process described above, then 95\% of
the time we get values between -3.75 and +3.75.  Since we want the motors to
be strong enough to compensate for forces in that range, we set $T$ to 10.

studywolf commented 9 years ago

looks good! just a clarification, in these sentences

Given this, we set the delays in the simulation to be uniformly chosen between 0 and 0.01. Importantly, we do not need to exactly measure the delay in the EV3 robot --- we just make sure that the minimal simulation is worse.

should that last line be something like "we just make sure that the minimal simulation can be worse" or "tests against worse"? or should the range be between .005 and .01?

tcstewar commented 9 years ago

Ah, good point... "can be worse", I think. :)

tcstewar / 2015-Embodied_Benchmarks

Relationship between the physical part of the benchmark and the minimal simulation #23