openai / gym

A toolkit for developing and comparing reinforcement learning algorithms.
https://www.gymlibrary.dev
Other
34.47k stars 8.59k forks source link

Why remove the first two joints' positions in Swimmer ? #837

Closed quanvuong closed 2 years ago

quanvuong commented 6 years ago

In the Swimmer environment, there are 5 joints. However, the step function removes the positions of the first two joints (x, y position of the whole body) from the state before returning the state.

I was wondering why these two scalars are removed from the state ? Thanks!

suryabhupa commented 6 years ago

Mujoco makes a distinction between the state and observation; there's a full system state that's maintained and updated by the simulator, but the policy is only able to see some of the information. In the case of swimmer, they remove the first two joints. If you check out HalfCheetah (and any other Mujoco environment), you'll notice they prune the full state to an observation.

quanvuong commented 6 years ago

Yeah that’s exactly my questions, how can we ensure that the policy performance is not negatively affected because of pruned state? It’s not unreasonable for a real robot to have access to the positions of the first two joints in this environment.

My guess is that since the goal is to make the snake swims in positive direction, its x, y positions are not important information to decide on actions.

On Sat, Jan 27, 2018 at 6:09 AM Surya Bhupatiraju notifications@github.com wrote:

Mujoco makes a distinction between the state and observation; there's a full system state that's maintained and updated by the simulator, but the policy is only able to see some of the information. In the case of swimmer, they remove the first two joints. If you check out HalfCheetah (and any other Mujoco environment), you'll notice they prune the full state to an observation.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/openai/gym/issues/837#issuecomment-360951739, or mute the thread https://github.com/notifications/unsubscribe-auth/AGqjODmdTAO8hLNpRrqX4TY6Lc1gTrAdks5tOoVlgaJpZM4RunEi .

-- Quan

UAE: +971 569 747 646 Skype: quan.vuong.nyu

quanvuong commented 6 years ago

In other words, if it is reasonable to expect a real-life robot to have access to the full state, why prune the state unless there is explicit guarantee that the pruning does not impose unreasonable constraint on the policy (i.e. there are no unexpected higher order effects) ?

JulianoLagana commented 4 years ago

My guess is that having states like x, y position of the body would make it significantly harder to use neural networks for the policy function. That is because these variables would have small ranges at the start of the optimization (the agent does not yet know how to swim, so it doesn't move so much), to huge ranges after a good policy was learnt.

jkterry1 commented 2 years ago

PR #2762 is about to be merged, introducing V4 MuJoCo environments using new bindings and a dramatically newer version of the engine. If this issue still persists with the V4 ones, please create a new issue for it.