Rendering after every few epochs at train time

aggritvik commented 1 year ago

Hi! Thanks for opensourcing this amazing piece of software! Currently, I'm trying to learn RL for Robotics and I want to be able to see the physical rendering of an episode after every few epochs at training time to visualise the gradual progression of the agent, from doing nothing to performing the task well. Feel free to skip my efforts below if there is a straightforward way to do this.

Brax version - 0.9.1

I've followed the example colab notebooks in the Readme section:

training.ipynb In this example notebook, I can't figure out how to place a call to HTML render function within the training loop.
training_torch.ipynb In this one, I'm using this function, inspired from the issue Rendering without notebooks.

def HTML(html_obj):
    with open('render.htm', 'wb') as f:
        f.write(html_obj.encode("UTF-8"))
        webbrowser.open(r'render.htm', new=0)
HTML(html.render(env.sys.replace(dt=env.dt), rollout))

I get the error

AttributeError: 'GymWrapper' object has no attribute 'sys'

I'd really appreciate any guidance I can get on how to proceed.

btaba commented 1 year ago

Hi @aggritvik, thanks for using brax! For the error you mentioned, GymWrapper has an attribute _env.sys

For rendering intermediate policies, you may want to use brax.io.image to create the rendered frames, and then render them using the progress_fn. Let us know if that works!

aggritvik commented 1 year ago

Thanks @btaba! I'll work on your inputs. Is there a brax tutorial, which can be followed by a beginner?

btaba commented 1 year ago

Hi @aggritvik, sorry for the late reply. There isn't an example of doing intermediate rendering right now, but the colab you mentioned in the original post has a call to def progress(num_steps, metrics):. To render intermediate episodes, it should likely be done during eval, somewhere around here https://github.com/google/brax/blob/main/brax/training/agents/ppo/train.py#L338-L344

btaba commented 1 month ago

This is now supported via the policy_params_fn

google / brax

Rendering after every few epochs at train time #369