Closed quanvuong closed 2 years ago
Mujoco makes a distinction between the state and observation; there's a full system state that's maintained and updated by the simulator, but the policy is only able to see some of the information. In the case of swimmer, they remove the first two joints. If you check out HalfCheetah (and any other Mujoco environment), you'll notice they prune the full state to an observation.
Yeah that’s exactly my questions, how can we ensure that the policy performance is not negatively affected because of pruned state? It’s not unreasonable for a real robot to have access to the positions of the first two joints in this environment.
My guess is that since the goal is to make the snake swims in positive direction, its x, y positions are not important information to decide on actions.
On Sat, Jan 27, 2018 at 6:09 AM Surya Bhupatiraju notifications@github.com wrote:
Mujoco makes a distinction between the state and observation; there's a full system state that's maintained and updated by the simulator, but the policy is only able to see some of the information. In the case of swimmer, they remove the first two joints. If you check out HalfCheetah (and any other Mujoco environment), you'll notice they prune the full state to an observation.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/openai/gym/issues/837#issuecomment-360951739, or mute the thread https://github.com/notifications/unsubscribe-auth/AGqjODmdTAO8hLNpRrqX4TY6Lc1gTrAdks5tOoVlgaJpZM4RunEi .
-- Quan
UAE: +971 569 747 646 Skype: quan.vuong.nyu
In other words, if it is reasonable to expect a real-life robot to have access to the full state, why prune the state unless there is explicit guarantee that the pruning does not impose unreasonable constraint on the policy (i.e. there are no unexpected higher order effects) ?
My guess is that having states like x, y position of the body would make it significantly harder to use neural networks for the policy function. That is because these variables would have small ranges at the start of the optimization (the agent does not yet know how to swim, so it doesn't move so much), to huge ranges after a good policy was learnt.
PR #2762 is about to be merged, introducing V4 MuJoCo environments using new bindings and a dramatically newer version of the engine. If this issue still persists with the V4 ones, please create a new issue for it.
In the Swimmer environment, there are 5 joints. However, the
step
function removes the positions of the first two joints (x, y position of the whole body) from the state before returning the state.I was wondering why these two scalars are removed from the state ? Thanks!