robotology / gym-ignition

Framework for developing OpenAI Gym robotics environments simulated with Ignition Gazebo
https://robotology.github.io/gym-ignition
GNU Lesser General Public License v3.0
224 stars 26 forks source link

Issue getting example code to run on fresh install #280

Closed HorvathDawson closed 3 years ago

HorvathDawson commented 3 years ago

Hey @diegoferigo I found this project while trying to find a good option for my undergraduate capstone aiming to do RL sim2real on a planarized monopod. I spent the last few days trying to work out how to get it running and I keep getting issues when the worlds physics engine gets set.

Using the same code as the simple pendulum from the getting started section on the website I get an error finding the dart physics engine.

[Err] [Physics.cpp:542] Failed to find plugin [ignition-physics-dartsim-plugin]

I looked where the error came from and it was from an empty path from finding the shared library. I set the property the same as in the example,

world.set_physics_engine(scenario_gazebo.PhysicsEngine_dart)

When the GUI loads the pendulum is in the world but it never ends up swinging down. I get the same error trying the manipulator example.

I tried looking through your documentation but didn't see anything about missing any environment variables etc. I would love to get help setting this up! Thank you for everything you have done with this!

I am using Ignition-dome installed with binaries and the stable install of gym-ignition. My computer is a fresh install of POP OS 20.04.

HorvathDawson commented 3 years ago

Followup... I was able to get rid of the error above except now I have a segfault. If I run

export IGN_GAZEBO_PHYSICS_ENGINE_PATH=/lib/x86_64-linux-gnu/ign-physics-3/engine-plugins

or

export IGN_GAZEBO_PHYSICS_ENGINE_PATH=/lib/x86_64-linux-gnu/ign-physics-2/engine-plugins

the error above goes away.. However now the simulation only loads the model visually into the world about 9/10th of the time (otherwise it shows on the sidebar but is not visually there). When it load into the world there is a segfault and the pendulum does not move.

The seg fault happens at the gazebo.run() line below,

pendulum.get_joint("pivot").to_gazebo().reset_position(0.1)

gazebo.run(paused=True)

diegoferigo commented 3 years ago

Hi @HorvathDawson, thanks for reporting these problems. I managed to reproduce them on a clean installation. I'll provide below some more information.

Both problems should not be blocking, you can use the project following the temporary workarounds listed below while they get fixed.

1. Error finding the physics plugin

The error message you got is raised when the shared library containing the selected physics engine cannot be found and loaded from the system. By default, the ign-physics path with official plugins should get hardcoded when the Physics system is compiled. However, when installing from the wheel package, for some reason this does not happen, I suspect there's a problem when the CI/CD pipeline compiles the package. What's strange is that unit testing there succeed.

Physics plugin load https://github.com/robotology/gym-ignition/blob/63015c33a1fee2ee1a9f1c37db0d5fd19b299fe9/cpp/scenario/plugins/Physics/Physics.cpp#L532-L546

As you correctly found, you can extend the search path of the plugins with the IGN_GAZEBO_PHYSICS_ENGINE_PATH environment variable. It should not be required, but temporarily this is a valid workaround.

Compiling the wheel locally does not show this problem, for this reason I suspect this is related to packaging.

2. Visualization segfault

The segfault you experienced is related only to visualization. If you run the script headless (i.e. without calling gazebo.gui()), I'm quite sure it succeeds. The responsible is this line of the simulator, introduced recently in https://github.com/ignitionrobotics/ign-gazebo/pull/272.

It seems that the calls of gazebo.step(paused=True) cause problems. They are used only to update the GUI without advancing the physics, can you try to remove them and just leave the runs inside the while loop?

Updated Example ```python import time import gym_ignition_models from scenario import gazebo as scenario_gazebo # Create the simulator gazebo = scenario_gazebo.GazeboSimulator(step_size=0.001, rtf=1.0, steps_per_run=1) # Initialize the simulator gazebo.initialize() # Get the default world and insert the ground plane world = gazebo.get_world() world.insert_model(gym_ignition_models.get_model_file("ground_plane")) # Select the physics engine world.set_physics_engine(scenario_gazebo.PhysicsEngine_dart) # Insert a pendulum world.insert_model(gym_ignition_models.get_model_file("pendulum")) # Get the pendulum model pendulum = world.get_model("pendulum") # Reset the pole position pendulum.get_joint("pivot").to_gazebo().reset_position(0.01) # Open the GUI gazebo.gui() time.sleep(3) gazebo.run(paused=True) # Simulate 30 seconds for _ in range(int(30.0 / gazebo.step_size())): gazebo.run() # Close the simulator time.sleep(5) gazebo.close() ```

If you notice, in the snippet above I removed those calls and delayed the opening of the gui. This should be a good starting point while I find some time to properly debug and fix these problems.

Edit: I just found a similar issue opened in upstream last week https://github.com/ignitionrobotics/ign-gazebo/issues/483, that could get fixed by https://github.com/ignitionrobotics/ign-gazebo/pull/495. Likely the problem is the same of what discussed here.

HorvathDawson commented 3 years ago

@diegoferigo Thank you for looking into this. The fix above worked for me! I also found another import issue when trying to use the pick and place example. I made another issue for that!

Thank you for all your work on this project!

diegoferigo commented 3 years ago

The fix above worked for me!

Great, good to know.

For Problem 2 I think we have to wait that PR to get approved and merged. Then, users that install Ignition from sources with colcon will have that fix with the defaults Dome tags. Everyone else, instead, have to wait the release and packaging of a new minor version of Ignition Dome, not sure about the schedule but it should not take more than a couple of months, Christmas included (4.1 has been released last week).

Regarding problem 1, I found the problem, and as I imagined t was in the automatic pipeline we use for packaging. I'll open a PR in these days with the fix.

I'll leave this issue open until both problems are fixed both upstream and here. Thanks @HorvathDawson again for reporting.

HorvathDawson commented 3 years ago

Follow up on problem 2,

I was setting up a gym environment following the examples.

I originally got a seg fault during the reset before the second episode in the randomizer. The seg fault happened when executing either of the paused runs in the following code snippet. This happens when the GUI is not being rendered too.

        if not gazebo.run(paused=True):
            raise RuntimeError("Failed to execute a paused Gazebo run")

        # Insert a new monopod model
        model = monopod.Monopod(world=task.world)

        # Store the model name in the task
        task.model_name = model.name()

        # Execute a paused run to process model insertion

        if not gazebo.run(paused=True):
            raise RuntimeError("Failed to execute a paused Gazebo run")

After replacing both of the paused runs with a time.sleep(1) the segfaults don't happen every run anymore. Sometimes a segfault will still randomly happen most likely because the delay isn't long enough.

Could this be caused by the same issue despite not involving the GUI?

diegoferigo commented 3 years ago

In the documented code snippets there are some sleeps to reduce the occurrence of these events. I'm surprised it happens also when the GUI is not opened, in our case this scenario works reliably. I don't exclude something changed in the simulator, even though if you never call gazebo.gui() there should be no state broadcasting (loaded here) that could create problem 2.

I still didn't manage to try the fix mentioned before. Since you're actively working on this, I'd recommend to follow the nightly installation that requires using colcon to create a workspace with ignition gazebo from sources. it shouldn't be too difficult. This approach would allow you to edit the yaml file containing the tags and checkout the feature branch of https://github.com/ignitionrobotics/ign-gazebo/pull/495 that contains the fix. It would result to a more stable setup, waiting that it gets merged upstream and lands in a release.

Note that you have to install the pre-release of gym-ignition as documented in the website. At the time of writing, the nightly channel matches exactly v1.0.1 since in the past weeks no new features have been merged.

HorvathDawson commented 3 years ago

@diegoferigo

Sorry I didnt have internet the last two weeks to try this. I switched to source and I am still getting the seg fault in my randomizer. I followed the examples almost exactly so I am not sure what is wrong with the code. do you have any ideas I could look into?

If i completely remove the lines that destroy the robot then insert it again and instead put this,

if task.model_name is None:
     model = monopod.Monopod(world=task.world)
     task.model_name = model.name()
     time.sleep(4)

then I do not get a seg fault. However stepping through execution it seems the seg fault does not happen till after both the randomizer, reset_task, and getting new observation. which I think is everything that happens during a environment reset right?

this is making it hard for me to debug.

diegoferigo commented 3 years ago

I didn't manage to try that branch yet, sorry for the delay. Can you please try to comment out the problematic line in the upstream file? Regardless of the fix of the PR, the following removes the function call that causes the segfault. It is not necessary for gym-ignition, so it does not affect any functionality.

sed -i -e "s|this->dataPtr->SetRemovedComponentsMsgs|//this->dataPtr->SetRemovedComponentsMsgs|g" $COLCON_PREFIX_PATH/../src/ign-gazebo/src/EntityComponentManager.cc

Then, compile again ign-gazebo:

cd $COLCON_PREFIX_PATH/../build/ignition-gazebo4
ninja install

The colcon environment must be initialized first by sourcing the setup script.

In general, debugging these problems from a complex environment is challenging. I would recommend to reproduce the problem using a single python file and iterate on that.

I'm still puzzled why you get problems without GUIs, I start suspecting there is something else going on because it never occurred. From the root of the repository, you can also try to execute the test suite that is completely headless. These resources are covered by the CI and if they fail on your setup we have the confirmation that something is wrong:

pip install pytest pytest-xvfb pytest-icdiff
pytest tests/
diegoferigo commented 3 years ago

@HorvathDawson https://github.com/ignitionrobotics/ign-gazebo/pull/495 was merged upstream and it was included in the 4.3 release (https://github.com/ignitionrobotics/ign-gazebo/pull/605). Can you please update your system and check if the problem persists?

diegoferigo commented 3 years ago

Closing. Feel free to open this issue again if the problems persist.