isaac-sim / IsaacGymEnvs

Isaac Gym Reinforcement Learning Environments
Other
2.04k stars 429 forks source link

Ant task not following the Gym Vector API #70

Open FelipeMartins96 opened 2 years ago

FelipeMartins96 commented 2 years ago

Using the code on the Creating an environment section on the README, the created vector env is not following the gym vector API. The environments should be reset automatically, such that the return from the step function when the done flag is present should contain the reward for that last step and the observations from the already reset state. It seems that the first observation after the reset is being returned only on the following step.

I did not test for other tasks, but it was unclear which way we should implement the reset logic on VecTasks so that it complies with the gym API. I got the API working on an environment I'm developing; however, this change on the new release made such that I never receive the timeout flag as I'm resetting the environment and reset_buf on the post physics step.

ViktorM commented 2 years ago

Hi @FelipeMartins96,

There were some limitations with running simulation on a GPU when some of the observations, like force sensors readings are available only after the first simulation step, but we can't step only a subset of the environments on GPU. So it was a design decision in Isaac Gym, as some of the observations could be incorrect just after the reset.

As for the 2nd question you need to update how timeout buffer is filled based on your changes to the env.

FelipeMartins96 commented 2 years ago

Thank you for the info @ViktorM,

So I should adapt my agent code to handle these transitions, right? The way It works now, I think I was adding to my replay buffer transitions between the episode's reset. I also observed a 'weird' behavior on the initial observations when using the GPU pipeline, I posted on the Isaac gym forum, do you know if it is related to the limitations you mentioned.

Where should I fill the timeout buffer? I was previously handling it on the post_physics_step; however, the VecTask flow now fills it after the post_physics_step; that's why I thought I needed to fill the reset buffer in a certain way