robfiras / loco-mujoco

Imitation learning benchmark focusing on complex locomotion tasks using MuJoCo.
MIT License
475 stars 38 forks source link

Occasional error while evaluating a Trained Agent with record=True #26

Open BernieChiu557 opened 2 months ago

BernieChiu557 commented 2 months ago

Hi,

I've trained a agent using the default imitation learning settings with HumanoidTorque.walk. When I evaluated the agent with record=True to save a video, I occasionally run into a open cv error as follows image The recording will stop and I would get a very short video. From some search, I guess that somehow the frame is empty . But this only occurs occasionally, so maybe it is a problem with env initiation?

Below is the snippet I use to evaluate the agent

from mushroom_rl.core import Core, Agent
from loco_mujoco import LocoEnv

env = LocoEnv.make("HumanoidTorque.walk")
agent = Agent.load("./HumanoidTorque_test/logs/loco_mujoco_evalution_2024-05-07_22-35-44/env_id___HumanoidTorque.walk/0/agent_epoch_95_J_986.666168.msh")

core = Core(agent, env)
core.evaluate(n_episodes=10, render=True, record=True)

I usually just need to run the same code a couple times to avoid the bug but thought it would be nice to fix it.

Thank you so much for the help in advance, and big props to the amazing work

robfiras commented 1 month ago

hi @BernieChiu557 thanks for reporting this! I actually never experienced this issue. Let me dig into this. What version of opencv-python are you using? And just in case you did not know, if you want to record a lot of videos, you can also do this is in headless mode (much faster). To do so, just add headless=True when making the environment --> LocoEnv.make("HumanoidTorque.walk", headless=True).

BernieChiu557 commented 1 month ago

Hi @robfiras

Thank you for the tip about the headless mode! that really helps a lot.

The opencv-python version in my conda environment is 4.9.0.80, and I'm running python 3.10. Let me know if you need to know other settings!

I'm also trying to find a way to consistently recreate this error by setting random_start=False and traversing through all the init_step_no available to see if any state in the dataset causes the problem. I will report here if that's the case.