ryanjulian / rllab

rllab is a framework for developing and evaluating reinforcement learning algorithms, fully compatible with OpenAI Gym.
Other
1 stars 21 forks source link

dm_control support #13

Closed ryanjulian closed 6 years ago

ryanjulian commented 6 years ago

DeepMind have released a set of RL environments called dm_control. We would like to use these enviroments in rllab.

This task is to add dm_control to the rllab conda environment, and implement a class (similar to GymEnv, e.g. DmControlEnv) which allows any rllab algorithm to learn against dm_control environments. You will also need to implement the plot interface for dm_control, which shows the user a 3D animation of the environment.

This is conceptually the same as GymEnv, which allows rllab users to import any OpenAI Gym environment and learn against them.

Consider this a professional software engineering task, and provide a high-quality solution which does not break existing users, minimizes change, and is stable. Please always use PEP8 style in your code, and format it using YAPF (with the PEP8 setting). Submit your pull request against the integration branch of this repository.

Some notes:

ryanjulian commented 6 years ago

From Junchao

dm_control does not have a viewer for viewing training process, but it gives the image pixels of the results. I try to re-use Pyglet that openai gym uses to show this pixels, but it seems it doesn't work.

I pose the code below. When I call "render", the images should be showed on a window. However, if I use "switch to" to switch the buffer on the back in render functioin, some global parameters seem to be changed. The results become zero after "step()" call in the main loop. If I remove the "switch to" function, the results are correct, but the images could not be showed.

ryanjulian commented 6 years ago

I've never used pyglet before, but I have a theory.

dm_control renders its frames in an off-screen OpenGL context. So it has a hidden OpenGL window and returns the rendered pixels from that.

pyglet also uses OpenGL. Your Window has an associated OpenGL Context.

OpenGL has the notion of a rendering context. In OpenGL, only one rendering context can be active at once on the system. In order to render to the user, pyglet needs its context to be active. Similarly, in order to render the scene off-screen, dm_control also needs its context to be active.

Your Window has an OpenGL context. When you call window.switch_to(), pyglet changes the system's OpenGL context to its own. If dm_control does not set its own context to current before it tries to render the next frame, then the rendering may be corrupted.

Programs which use OpenGL should always set their own context to current before rendering, but I have noticed a common bug in RL visualizers where the author forgets to do this.

I will look at the dm_control codebase to see if they have this bug.

Some suggestions to solve your problem:

  1. Find a way to switch dm_control's context back after you render (or fix the dm_control bug)
  2. Use pygame, which is already a rllab dependency, and supports 2D rendering without OpenGL (which is all you need, anyway)
ryanjulian commented 6 years ago

Here is relevant code in dm_control

https://github.com/deepmind/dm_control/blob/master/dm_control/render/glfw_renderer.py#L65 https://github.com/deepmind/dm_control/blob/master/dm_control/mujoco/engine.py#L406 https://github.com/deepmind/dm_control/blob/master/dm_control/mujoco/engine.py#L412 https://github.com/deepmind/dm_control/blob/master/dm_control/mujoco/engine.py#L560

It seems to me that dm_control is probably handling the context switch properly, as long you are using the latest version from their Github repository. A previous version did not handle it properly

Are you using the latest version of dm_control from Github? If so, I am not sure why it's broken, but you could save the current (e.g. dm_control) context before you render, and then restore it after you render using pyglet.gl. This is the same thing dm_control does to avoid bugs.

Other notes:

cjcchen commented 6 years ago

I use pygame instand and it works. Thanks for help.

However, I get some trouble building the rllab project. When I import the env.base, some files seem to be missing.

image

The base.py will import cached_property, but I could not find this file or directory.

Is there any file missing?

ryanjulian commented 6 years ago

Did you setup rllab using conda, as described here? It is difficult get all the dependencies right using pip.

cjcchen commented 6 years ago

Sorry, my mistake. I though it was a library of rllab. I am using conda, but this library actually was not installed. I have used pip to install it correctly.

cjcchen commented 6 years ago

It seems all is done. Should I push my code to the integration repository and you can have a review? But it seems that I have no permission for this repository.

ryanjulian commented 6 years ago

That is really odd. It should definitely have been installed per environment.yml

Can you file an issue with steps to reproduce?

From above:

You can find examples of how to launch rllab in examples and sandbox/rocky/tf/launchers. Note that everything must run using the run_experiment_lite wrapper.

For example, in trpo_gym_tf_cartpole.py I should be able to replace

env = TfEnv(normalize(GymEnv("CartPole-v0", force_reset=True)))

with

env = TfEnv(normalize(DmControlEnv(domain_name="cartpole", task_name="swingup")))

and see a 3D plot of the cartpole. It should also still train the cartpole to swing up :).

To submit your code, upload it to your own fork and and then a pull request to the integration branch of this repository. See https://help.github.com/articles/creating-a-pull-request/

cjcchen commented 6 years ago

Thanks, it is a good case to check. And I found a problem when I use normalize class.

The normalize class uses the flat_dim and shape to normalize the data. For gym, the observation and action have their own space, an discrete array or a metrix box. However, for dm_control, the observation space consists of three array(position, velocity, rgb), and the action space is an array with min-max value. It seems hard to convert these two spaces.

I am not sure how to convert these two space so that I can reuse the normalize class.

Do you have any suggestion?

ryanjulian commented 6 years ago

Remember that the interface you are implementing is the rllab.envs.base.Env interface, not the gym.Env interface. Take a look at other classes implementing rllab.envs.base.Env. There are many of them. Most of them use spaces.Box for observation and action spaces. Take a look at how they use spaces.

rllab has no notion of labeled subspaces like dm_control does, so it's sufficient to just concatenate them into one larger space for rllab. We do not have image support yet, so you can ignore the RGB subspace for now.

ryanjulian commented 6 years ago

Did that answer your question?

cjcchen commented 6 years ago

Yes, I do have checked some other env codes. The other env hard code the observation space dim except for the Gym. I think Gym is much similar with dm_control, so I just gave an example.

Since I think the observation space is used for training, I am not sure which type of data it should be returned. I just want to make sure that the algorithm is correct and do not raise some ambiguous meaning.

Now I will concatenate all the data, so that it would not be modified if the image is supported.

But I still have a question, how do you clarify the three types of data(position, velocity, rgb) without any shape provided? I just return a large space without any information. As you said, I can ignore the rgb data at this time. However, if rllab supports image training in the future, how do you remove the rgb data from the space to train the old model?

Thanks.

cjcchen commented 6 years ago

I have pushed a pull request, please have a check.

ryanjulian commented 6 years ago

Fixed in #48