Multiple resets before stepping makes observations junk

HorvathDawson commented 2 years ago

Description:

When multiple resets happen before the initial step the observation values become junk.

Example of bad observations:

[2.48752798e+10 2.48752798e+13 1.92311709e+10 1.92311709e+13
 1.05943662e+10 1.05943662e+13 1.73760567e+10 1.73760567e+13
 4.64308078e+09 4.64308078e+12]
[ 5.76040155e-01 -4.83243193e-01 -5.72812931e-01  3.71046713e+00
  1.72736902e-03  3.44344853e+00  3.04615830e-01  6.65370534e-01
  3.43991476e-03 -4.19276817e-01]
[ 0.57428055 -1.75960042 -0.57034015  2.47277984  0.00307352  1.34615452
  0.30532197  0.7061436   0.00305073 -0.38917982]
[ 1.30214752e+10  1.30214752e+13  2.32740358e+11  2.32740358e+14
  2.37263629e+11  2.37263629e+14  8.51912783e+10  8.51912783e+13
 -2.55717189e+10 -2.55717189e+13]

Example of expected observations:

[ 0.70883607  3.06453891 -0.1113847   1.32162828  0.11874101  0.86413771
  0.18686704 -0.34228004  0.0448376   0.46870266]
[ 0.71061966  1.78359314 -0.10892029  2.46441255  0.11872182 -0.01918816
  0.18653529 -0.33174444  0.04530129  0.46369175]
[ 0.71425176  3.63210301 -0.10729784  1.62245239  0.12016356  1.44174148
  0.18617237 -0.3629249   0.04576737  0.46607455]
[ 0.71870738  4.45561494 -0.10559593  1.70190148  0.12232909  2.16552533
  0.18579153 -0.38084443  0.04623357  0.46620485]
[ 0.72338892  4.6815451  -0.10484045  0.75548776  0.12440563  2.07653708
  0.18539936 -0.39216616  0.04670759  0.47402296]

Steps to reproduce

Note gym_bb is just our custom environment of gym-ignition. The repo containing the code can be found here. https://github.com/Baesian-Balancer/gym-bb

import gym
import time
import functools
from gym_ignition.utils import logger
from gym_bb import randomizers

from gym_ignition.utils.typing import Action, Reward, Observation

env_id = "Monopod-Gazebo-v1"

def make_env_from_id(env_id: str, **kwargs) -> gym.Env:
    import gym
    import gym_bb
    return gym.make(env_id, **kwargs)

make_env = functools.partial(make_env_from_id, env_id=env_id)

env = randomizers.monopod.MonopodEnvRandomizer(env=make_env)
env.seed(42)

# This initial reset existing causes the bad observation.
# Removing the reset here makes it good again
observation = env.reset()

for epoch in range(1000):

    observation = env.reset()

    done = False

    while not done:
        action = env.action_space.sample()
        observation, reward, done, _ = env.step(action)
        print(observation)

env.close()
time.sleep(5)

Additional context

Multiple resets before stepping makes observations junk

Environment

OS: popOS 20.04
GPU: 1650 RTX
Python: 3.8.10
Version:
Channel:
- [x] Stable
Installation type:
- [x] User

diegoferigo commented 2 years ago

If I followed well your code, the method that you call multiple times is MonopodBase.reset_task. I can see that here you call *.to_gazebo.reset_* methods. After these methods, agazebo.run(paused=True)` is necessary in order to update the state of simulator. This cannot be done from the task since it should not control the simulator itself, it is not designed to do so.

In general, considering how you structured your environment (the intended way :wink:) you should try to avoid any to_gazebo call in your task class. This would make your class fully compatible with all ScenarIO backends (that currently is only Gazebo, but this is the right mindset to get engine-agnostic tasks which is one of the desiderata of the project).

I think that a possible fix would be moving this randomization to... the randomizer, that is where it should belong. The randomizer, instead, does have access to the GazeboSimulator object and can reset the model to the desired position and velocity before env.reset (which calls in sequence Task.reset_task and then Task.get_observation) is called. You don't have to do this yourself since it is already done in:

https://github.com/robotology/gym-ignition/blob/311b08ef16651d9a915d11866cd1d6df923ca681/python/gym_ignition/randomizers/gazebo_env_randomizer.py#L97

So, to recap, if I am right, you can solve by moving this logic after these lines.

HorvathDawson commented 2 years ago

Hello @diegoferigo,

Thank you so much for the very thorough answer. It has helped me understand the intended structure a lot better.

I just changed my environment to have the reset in the randomizer instead. However, the weird observation values still happen when there are multiple resets before stepping happens. When I change the location of the second reset to be after the epoch like this,

The issue does not persist. It seems anytime there are 2 resets before the first step is the only time this issue happens. It isn't a very bad bug (except being hard to find).

I have a few follow ups about how to structure the environments to work on a real robot / the recommended way for me to implement my own scenarIO back end for my robot. However, I will move this over to the github discussions.

diegoferigo commented 2 years ago

Strange behavior, I'm not really sure who to blame :) I tried on my setup that is based on Ignition Fortress + our devel branch and I get the following:

Script

```python import gym import time import functools from gym_ignition.utils import logger from gym_bb import randomizers from gym_ignition.utils.typing import Action, Reward, Observation env_id = "Monopod-Gazebo-v1" def make_env_from_id(env_id: str, **kwargs) -> gym.Env: import gym import gym_bb return gym.make(env_id, **kwargs) make_env = functools.partial(make_env_from_id, env_id=env_id) env = randomizers.monopod.MonopodEnvRandomizer(env=make_env) env.seed(42) # Try to reset multiple times print(env.reset()) print(env.reset()) print(env.reset()) print(env.reset()) print(env.reset()) print(env.reset()) print(env.reset()) print(env.reset()) print(env.reset()) print(env.reset()) ```

gym-bb on  main via 🐍 v3.8.10 🅒 /conda  took 5s 
✦ ❯ ipython script.py
INFO: Making new env: Monopod-Gazebo-v1 ({'physics_engine': 0})
[Wrn] [ServerConfig.cc:860] IGN_GAZEBO_SERVER_CONFIG_PATH set but no file found, no plugins loaded
WARN: Box bound precision lowered by casting to float64
[-0.4608653  -0.01303166  0.4608653  -0.04270197 -0.0042243  -0.04540492
  0.30490549  0.04584364 -0.00212186 -0.03880929]
[-0.32561324 -0.03355728  0.32561324 -0.00463403  0.00401983 -0.03226874
  0.29692544  0.02540731  0.0045288  -0.04842035]
[ 0.37949755 -0.03606021 -0.37949755 -0.01683971 -0.00187991 -0.01650839
  0.30255707  0.04046996 -0.00129108 -0.02424057]
[-0.00804839  0.04977905  0.00804839  0.02675372  0.00226717 -0.0341412
  0.30092956 -0.02311185 -0.00035272 -0.04039219]
[ 5.18247062e-01 -1.19754151e-02 -5.18247062e-01  6.42832349e-03
 -2.45487448e-03 -1.03810239e-02  2.96015569e-01  4.23681635e-02
 -3.95638763e-03  2.18586356e-04]
[ 0.55658223  0.01573295 -0.55658223 -0.01650758 -0.00189508 -0.03451867
  0.30277819 -0.03263973 -0.00453006 -0.02803389]
[-7.90700858e-02  2.79145583e-02  7.90700858e-02 -3.65495938e-03
  2.11424871e-03  2.26854122e-02  3.04595148e-01 -8.03847243e-03
  1.37859649e-04  1.12630982e-02]
[-0.53846333  0.04530475  0.53846333 -0.00717272 -0.00228832  0.00621186
  0.29886509  0.03637921 -0.00247402 -0.04152875]
[-1.18975457e-01 -3.37652576e-03  1.18975457e-01 -3.58842517e-02
 -5.03471231e-05 -1.80997566e-02  3.01628825e-01  1.45664316e-03
  3.00660125e-03 -4.49255152e-02]
[-0.19562574 -0.02750057  0.19562574 -0.03728941 -0.00491874  0.00266767
  0.2960702   0.03454619  0.00497083 -0.03656328]

which seems ok, right?

HorvathDawson commented 2 years ago

@diegoferigo Yes that seems correct. I just tried that same script on my setup and got the same results. However after modifying the script a bit I found the minimum example to reproduce the bad behaviour.

import gym
import functools
from gym_bb import randomizers

env_id = "Monopod-v1"

def make_env_from_id(env_id: str, **kwargs) -> gym.Env:
    import gym
    import gym_bb
    return gym.make(env_id, **kwargs)

make_env = functools.partial(make_env_from_id, env_id=env_id)

env = randomizers.monopod.MonopodEnvRandomizer(
    env=make_env, reward_class_name='BalancingV1')
env.seed(42)

# Try to reset multiple times
action = env.action_space.sample()
print(env.reset())
print(env.step(action))
print(env.reset())
print(env.reset())
print(env.step(action))
print(env.reset())
print(env.step(action))
print(env.step(action))

which gave this output

[0.  0.  0.3 0.  0.  0.  0.  0.3 0.  0. ]
(array([-1.15407057e-06, -1.36305522e-13,  3.00293390e-01, -8.77843805e-17,
        1.33360434e-13, -1.15407057e-03, -1.36305522e-10,  2.93390090e-01,
       -8.77843805e-14,  1.33360434e-10]), 0.30029339009044886, False, {})
[0.  0.  0.3 0.  0.  0.  0.  0.3 0.  0. ]
[0.  0.  0.3 0.  0.  0.  0.  0.3 0.  0. ]
(array([-1.07746080e-02,  5.44272987e-03,  3.00654145e-01, -3.60783009e-04,
       -6.41573538e-03, -1.07746080e+01,  5.44272987e+00,  6.54144668e-01,
       -3.60783009e-01, -6.41573538e+00]), 0.30065414466836143, False, {})
[0.  0.  0.3 0.  0.  0.  0.  0.3 0.  0. ]
(array([-1.53336308e-02,  4.97875066e-04,  3.00904667e-01,  1.52682613e-04,
       -1.49304940e-02, -1.53336308e+01,  4.97875066e-01,  9.04667237e-01,
        1.52682613e-01, -1.49304940e+01]), 0.3009046672371765, False, {})
WARN: The observation does not belong to the observation space
(array([ 1.07756050e+12,  2.58790241e+11,  2.14876645e+10, -5.72973833e+09,
        1.86600059e+12,  1.07756050e+15,  2.58790241e+14,  2.14876645e+13,
       -5.72973833e+12,  1.86600059e+15]), 21487664492.98367, True, {})

I am very confused with what is happening here.

HorvathDawson commented 2 years ago

I updated to v1.3.0 and ignition fortress and the results are worse. I can not render the enviroment because of #402 to make sure everything is running okay still but after running the above script again on the new version I got the results,

[Wrn] [ServerConfig.cc:860] IGN_GAZEBO_SERVER_CONFIG_PATH set but no file found, no plugins loaded
WARN: Box bound precision lowered by casting to float64
[0.  0.  0.3 0.  0.  0.  0.  0.3 0.  0. ]
(array([-1.15415634e-06, -1.36304940e-13,  3.00293390e-01, -8.77607449e-17,
        1.33359213e-13, -1.15415634e-03, -1.36304940e-10,  2.93390091e-01,
       -8.77607449e-14,  1.33359213e-10]), 0.30029339009134215, False, {})
[0.  0.  0.3 0.  0.  0.  0.  0.3 0.  0. ]
[0.  0.  0.3 0.  0.  0.  0.  0.3 0.  0. ]
WARN: The observation does not belong to the observation space
(array([-2.87089631e+11,  3.22248969e+10,  1.18477461e+10,  3.16331085e+09,
       -2.71644090e+11, -2.87089631e+14,  3.22248969e+13,  1.18477461e+13,
        3.16331085e+12, -2.71644090e+14]), 11847746128.844805, True, {})
[0.  0.  0.3 0.  0.  0.  0.  0.3 0.  0. ]
(array([-3.08154667e+11,  3.42800180e+10,  1.18057544e+10,  3.15218282e+09,
       -2.90660812e+11, -3.08154667e+14,  3.42800180e+13,  1.18057544e+13,
        3.15218282e+12, -2.90660812e+14]), 11805754354.421728, True, {})
(array([ 3.43994049e+22, -1.00836623e+22, -6.41627739e+14, -1.22564216e+15,
        1.33670549e+22,  3.43994049e+25, -1.00836623e+25, -6.41639545e+17,
       -1.22564531e+18,  1.33670549e+25]), -641627739487769.9, True, {})

diegoferigo commented 2 years ago

I created a clean ubuntu focal system by executing the following commands in a docker container

# Start the container with: docker run -it ubuntu:focal bash

apt update
export IGNITION_DISTRIBUTION="fortress"
export IGNITION_DEFAULT_CHANNEL="stable"
apt install virtualenv wget lsb-release gnupg2 git
echo "deb http://packages.osrfoundation.org/gazebo/ubuntu-${IGNITION_DEFAULT_CHANNEL} `lsb_release -cs` main" > \
    /etc/apt/sources.list.d/gazebo-${IGNITION_DEFAULT_CHANNEL}.list
wget http://packages.osrfoundation.org/gazebo.key -qO - | apt-key add -
apt update
apt install ignition-fortress

virtualenv /tmp/venv
source /tmp/venv/bin/activate
pip install -U pip

pip install git+https://github.com/Baesian-Balancer/gym-bb
pip install ipython
pip install -U "gym-ignition==1.3.0" "scenario==1.3.0"

sed -i "s|from . import monitor|# from . import monitor|g" /tmp/venv/lib/python3.8/site-packages/gym_bb/__init__.py

And then executing the script (running it multiple times yield reproducible results):

import gym
import time
import functools
from gym_ignition.utils import logger
from gym_bb import randomizers

from gym_ignition.utils.typing import Action, Reward, Observation

env_id = "Monopod-Gazebo-v1"

def make_env_from_id(env_id: str, **kwargs) -> gym.Env:
    import gym
    import gym_bb
    return gym.make(env_id, **kwargs)

make_env = functools.partial(make_env_from_id, env_id=env_id)

env = randomizers.monopod.MonopodEnvRandomizer(env=make_env)
env.seed(42)

# Try to reset multiple times
action = env.action_space.sample()
print(env.reset())
print(env.step(action))
print(env.reset())
print(env.reset())
print("===>")
print(env.step(action))
print("<===")
print(env.reset())
print(env.step(action))
print(env.step(action))

Output:

[-0.4608653  -0.01303166  0.4608653  -0.04270197 -0.0042243  -0.04540492
  0.30490549  0.04584364 -0.00212186 -0.03880929]
(array([-4.60865300e-01, -5.30130273e-10,  4.60865300e-01, -9.42064204e-11,
       -4.22430323e-03, -2.22622365e-10,  3.04943663e-01,  3.81699690e-02,
       -2.16212358e-03, -4.02639642e-02]), 8.704630994507406, False, {})
[-0.32561324 -0.03355728  0.32561324 -0.00463403  0.00401983 -0.03226874
  0.29692544  0.02540731  0.0045288  -0.04842035]
[ 0.37949755 -0.03606021 -0.37949755 -0.01683971 -0.00187991 -0.01650839
  0.30255707  0.04046996 -0.00129108 -0.02424057]
===>
WARN: The observation does not belong to the observation space
(array([-1.88644495e+09, -1.88644495e+12, -3.55188390e+10, -3.55188390e+13,
       -1.68572931e+10, -1.68572931e+13,  2.60987435e+07,  2.60987432e+10,
        6.60888064e+07,  6.60888064e+10]), 1.2805865140623118e-11, True, {})
<===
[-0.00804839  0.04977905  0.00804839  0.02675372  0.00226717 -0.0341412
  0.30092956 -0.02311185 -0.00035272 -0.04039219]
(array([-4.87550921e+10, -4.87550921e+13, -8.57572631e+11, -8.57572631e+14,
       -3.89485695e+11, -3.89485695e+14,  1.63443832e+10,  1.63443832e+13,
       -1.15210537e+09, -1.15210537e+12]), 2.044843067068891e-14, True, {})
(array([ 6.14321713e+17,  6.14321762e+20,  2.78311246e+22,  2.78311246e+25,
        7.79909646e+17,  7.79910036e+20, -7.52601307e+15, -7.52602941e+18,
        1.91471666e+15,  1.91471781e+18]), 4.440814240705095e-20, True, {})

I couldn't visualize the environment from the container I created on the fly, but the simulation is indeed exploding. I suspect it depends on the randomized state from which the model is initialized. Are you sure there are no configuration in which the model is initialized penetrating the ground? Of course, in this scenario, it would receive a huge reaction force and the simulation makes sense that it explodes. After this look, it seems that it does not depend on gym-ignition / scenario, rather the implementation of the environment.

HorvathDawson commented 2 years ago

I have tried isolating the problem using the above idea making the monopod 100% impossible to penetrate the ground. I also have a new version of our environment which has completely changed a lot of the code base from the current implementation including no reset randomization that still has this issue.

The extra confusing part is that when you remove the extra reset everything is fine again, no matter how many episodes of training you do. This makes me believe that it isn't clipping due to the randomizer or something with the main logic of the environment. There must be some weird underlying condition that gets changed with the order of resets..

I have dug into my code base pretty thoroughly and can't find the culprit. I think we should close this issue for now and if I find the cause I will followup in this same thread. :)

Thank you as always @diegoferigo

diegoferigo commented 2 years ago

Sure, feel free to open this issue again if needed. Closing.

robotology / gym-ignition