[Question] Documentation for lunar lander rewards incomplete

ToonTalk commented 2 years ago

https://www.gymlibrary.ml/environments/box2d/lunar_lander/#rewards states

Reward for moving from the top of the screen to the landing pad and coming to rest is about 100-140 points.

This is very vague. Is the reward incrementally awarded or only after landing? What determines whether it is 100, 140, or in between?

pseudo-rnd-thoughts commented 2 years ago

The full documentation is

Reward for moving from the top of the screen to the landing pad and coming to rest is about 100-140 points. If the lander moves away from the landing pad, it loses reward. If the lander crashes, it receives an additional -100 points. If it comes to rest, it receives an additional +100 points. Each leg with ground contact is +10 points. Firing the main engine is -0.3 points each frame. Firing the side engine is -0.03 points each frame. Solved is 200 points.

I agree that the documentation is not particular clear but my understanding is As the robot has four legs I believe and the robot receives 10 points per legs and 100 points for landing. Therefore, the total reward for landing is between 100 to 140.

RedTachyon commented 2 years ago

The documentation is somewhat incomplete, but the reward function itself is also a bit complext to put into words. I recommend checking out the source code to see how the reward is computed (it's dependent on the position, velocity, angle, the contact of two legs, the energy usage, and the completion of the objective). If you or someone can convert it to a nice natural language description, that would be great.

vairodp commented 2 years ago

I analyzed how the reward is calculated and I can safely say that the "100..140" is inaccurate and misleading.

The reward takes to account at every step:

position (the closer to the landing pad the better)
speed (the slower the better)
the angle (the less tilted the better)
the legs touching the pad (as accurately described)
the fuel consumed (as accurately described)

(Also, at the end of the episode +- 100 as accurately described)

I tried to dissection in the code how the reward is calculated and I couldn't find evidence of the "100..140" range:

If it was referring for just the calculation of the position of the lander it would be inaccurate cause the range (after I run several experiments/simulations by dissecting the reward calculation) would be from 0 to 140
if it was referring for the position + any other feature it will be inaccurate cause it can greatly go higher than 140 or lower than 100

To fix this inaccuracy in the documentation I tried to rewrite the doc in a more "complete" way and already did the PR here

openai / gym

[Question] Documentation for lunar lander rewards incomplete #3014