opendilab / DI-engine

OpenDILab Decision AI Engine
https://di-engine-docs.readthedocs.io
Apache License 2.0
2.79k stars 348 forks source link

docker内运行lunarlander_dqn_deploy失败 #793

Closed Eric-Zhao1 closed 2 months ago

Eric-Zhao1 commented 2 months ago

docker运行的是ding:nightly版本,docker pull opendilab/ding:nightly。 运行使用 DQN 算法训练的智能体模型: final.pth.tar 报错如下: 3

有遇到过类似问题吗?

qboy21 commented 2 months ago

Error:

AttributeError Traceback (most recent call last) Cell In[9], line 32 29 print(f'Deploy is finished, final epsiode return is: {returns}') 31 if name == "main": ---> 32 main(main_config=main_config, create_config=create_config, ckpt_path='/Users/qboy/Downloads/rl/final.pth.tar')

Cell In[9], line 14 12 main_config.exp_name = 'default' # Set the name of the experiment to be run in this deployment, which is the name of the project folder to be created 13 cfg = compile_config(main_config, create_cfg=create_config, auto=True) # Compile and generate all configurations ---> 14 env = DingEnvWrapper(gym.make(cfg.env.env_id), EasyDict(env_wrapper='default')) # Add the DI-engine environment decorator upon the gym's environment instance 15 #env.enable_save_replay(replay_path='./lunarlander_dqn_deploy/video') # Enable the video recording of the environment and set the video saving folder 16 model = DQN(**cfg.policy.model) # Import model configuration, instantiate DQN model

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/gym/envs/registration.py:581, in make(id, max_episode_steps, autoreset, apply_api_compatibility, disable_env_checker, **kwargs) 578 envcreator = spec.entry_point 579 else: 580 # Assume it's a string --> 581 envcreator = load(spec.entry_point) 583 mode = _kwargs.get("render_mode") 584 apply_human_rendering = False

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/gym/envs/registration.py:61, in load(name) 52 """Loads an environment with name and returns an environment creation function 53 54 Args: (...) 58 Calls the environment constructor 59 """ 60 mod_name, attr_name = name.split(":") ---> 61 mod = importlib.import_module(mod_name) 62 fn = getattr(mod, attr_name) 63 return fn

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/importlib/init.py:126, in import_module(name, package) 124 break 125 level += 1 --> 126 return _bootstrap._gcd_import(name[level:], package, level)

File :1050, in _gcd_import(name, package, level)

File :1027, in _find_andload(name, import)

File :992, in _find_and_loadunlocked(name, import) ... --> 435 _Box2D.RAND_LIMIT_swigconstant(_Box2D) 436 RAND_LIMIT = _Box2D.RAND_LIMIT 438 def b2Random(*args):

AttributeError: module '_Box2D' has no attribute 'RAND_LIMIT_swigconstant'

Code (Exactly from tutorial example) import gym # Load the gym library, which is used to standardize the reinforcement learning environment import torch # Load the PyTorch library for loading the Tensor model and defining the computing network from easydict import EasyDict # Load EasyDict for instantiating configuration files from ding.config import compile_config # Load configuration related components in DI-engine config module from ding.envs import DingEnvWrapper # Load environment related components in DI-engine env module from ding.policy import DQNPolicy, single_env_forward_wrapper # Load policy-related components in DI-engine policy module from ding.model import DQN # Load model related components in DI-engine model module from dizoo.box2d.lunarlander.config.lunarlander_dqn_config import main_config, create_config # Load DI-zoo lunarlander environment and DQN algorithm related configurations

def main(main_config: EasyDict, create_config: EasyDict, ckpt_path: str): main_config.exp_name = 'default' # Set the name of the experiment to be run in this deployment, which is the name of the project folder to be created cfg = compile_config(main_config, create_cfg=create_config, auto=True) # Compile and generate all configurations env = DingEnvWrapper(gym.make(cfg.env.env_id), EasyDict(env_wrapper='default')) # Add the DI-engine environment decorator upon the gym's environment instance

env.enable_save_replay(replay_path='./lunarlander_dqn_deploy/video') # Enable the video recording of the environment and set the video saving folder

model = DQN(**cfg.policy.model) # Import model configuration, instantiate DQN model
state_dict = torch.load(ckpt_path, map_location='cpu') # Load model parameters from file
model.load_state_dict(state_dict['model']) # Load model parameters into the model
policy = DQNPolicy(cfg.policy, model=model).eval_mode # Import policy configuration, import model, instantiate DQN policy, and turn to evaluation mode
forward_fn = single_env_forward_wrapper(policy.forward) # Use the strategy decorator of the simple environment to decorate the decision method of the DQN strategy
obs = env.reset() # Reset the initialization environment to get the initial observations
returns = 1000. # Initialize total reward
while True: # Let the agent's strategy and environment interact cyclically until the end
    action = forward_fn(obs) # According to the observed state, make a decision and generate action
    obs, rew, done, info = env.step(action) # Execute actions, interact with the environment, get the next observation state, the reward of this interaction, the signal of whether to end, and other information
    returns += rew # Cumulative reward return
    if done:
        break
print(f'Deploy is finished, final epsiode return is: {returns}')

if name == "main": main(main_config=main_config, create_config=create_config, ckpt_path='/Users/qboy/Downloads/rl/final.pth.tar')

Attempted Fix Uninstalled Box2D and reinstalled with same issue

Overall, finding issues with most of the code examples. If you are not supporting the library anymore, no problem. Please state. Thank you.

PaParaZz1 commented 2 months ago

docker运行的是ding:nightly版本,docker pull opendilab/ding:nightly。 运行使用 DQN 算法训练的智能体模型: final.pth.tar 报错如下: 3

有遇到过类似问题吗?

这应该是训练完成后存储 replay 视频时的问题(缺少 libx264 库),但是我们在最新的 docker pull opendilab/ding:nightly 镜像(IMAGE ID 01c195e0ee17)中运行类似的存储 replay 任务并未出现该问题。你可以检查一下镜像是否对齐,gym 版本是不是 0.25.1,以及你具体跑存视频的代码是什么?

Eric-Zhao1 commented 2 months ago

docker运行的是ding:nightly版本,docker pull opendilab/ding:nightly。 运行使用 DQN 算法训练的智能体模型: final.pth.tar 报错如下: 3 有遇到过类似问题吗?

这应该是训练完成后存储 replay 视频时的问题(缺少 libx264 库),但是我们在最新的 docker pull opendilab/ding:nightly 镜像(IMAGE ID 01c195e0ee17)中运行类似的存储 replay 任务并未出现该问题。你可以检查一下镜像是否对齐,gym 版本是不是 0.25.1,以及你具体跑存视频的代码是什么?

最新版本opendilab/ding:nightly 镜像,gym版本是0.25.1 运行代码是https://di-engine-docs.readthedocs.io/zh-cn/latest/01_quickstart/hello_world_for_DI_zh.html 中“先定一个小目标:让你的智能体动起来”部分,代码如下:

import gym # 载入 gym 库,用于标准化强化学习环境
import torch # 载入 PyTorch 库,用于加载 Tensor 模型,定义计算网络
from easydict import EasyDict # 载入 EasyDict,用于实例化配置文件
from ding.config import compile_config # 载入DI-engine config 中配置相关组件
from ding.envs import DingEnvWrapper # 载入DI-engine env 中环境相关组件
from ding.policy import DQNPolicy, single_env_forward_wrapper # 载入DI-engine policy 中策略相关组件
from ding.model import DQN # 载入DI-engine model 中模型相关组件
from dizoo.box2d.lunarlander.config.lunarlander_dqn_config import main_config, create_config # 载入DI-zoo lunarlander 环境与 DQN 算法相关配置

def main(main_config: EasyDict, create_config: EasyDict, ckpt_path: str):
    main_config.exp_name = 'lunarlander_dqn_deploy' # 设置本次部署运行的实验名,即为将要创建的工程文件夹名
    cfg = compile_config(main_config, create_cfg=create_config, auto=True) # 编译生成所有的配置
    env = DingEnvWrapper(gym.make(cfg.env.env_id), EasyDict(env_wrapper='default')) # 在gym的环境实例的基础上添加DI-engine的环境装饰器
    env.enable_save_replay(replay_path='./lunarlander_dqn_deploy/video') # 开启环境的视频录制,设置视频存放位置
    model = DQN(**cfg.policy.model) # 导入模型配置,实例化DQN模型
    state_dict = torch.load(ckpt_path, map_location='cpu') # 从模型文件加载模型参数
    model.load_state_dict(state_dict['model']) # 将模型参数载入模型
    policy = DQNPolicy(cfg.policy, model=model).eval_mode # 导入策略配置,导入模型,实例化DQN策略,并选择评价模式
    forward_fn = single_env_forward_wrapper(policy.forward) # 使用简单环境的策略装饰器,装饰DQN策略的决策方法
    obs = env.reset() # 重置初始化环境,获得初始观测
    returns = 0. # 初始化总奖励
    while True: # 让智能体的策略与环境,循环交互直到结束
        action = forward_fn(obs) # 根据观测状态,决定决策动作
        obs, rew, done, info = env.step(action) # 执行决策动作,与环境交互,获得下一步的观测状态,此次交互的奖励,是否结束的信号,以及其它环境信息
        returns += rew # 累计奖励回报
        if done:
            break
    print(f'Deploy is finished, final epsiode return is: {returns}')

if __name__ == "__main__":
    main(main_config=main_config, create_config=create_config, ckpt_path='./final.pth.tar')
Eric-Zhao1 commented 2 months ago

可以确定是保存视频导致的问题,代码对应 env.enable_save_replay(replay_path='./lunarlander_dqn_deploy/video') 但是之前尝试安装 libx264 库,并未解决,不知是否有解决方案?

PaParaZz1 commented 2 months ago

可以确定是保存视频导致的问题,代码对应 env.enable_save_replay(replay_path='./lunarlander_dqn_deploy/video') 但是之前尝试安装 libx264 库,并未解决,不知是否有解决方案?

这个问题解决了。原因是 pytorch 官方镜像中安装的 ffmpeg 版本是 4.3.0,会和镜像中默认的 libx264 库有冲突问题。使用命令conda install -c conda-forge ffmpeg==4.2.2安装低版本的 ffmpeg,然后就可以正常生成视频了。

Eric-Zhao1 commented 2 months ago

可以确定是保存视频导致的问题,代码对应 env.enable_save_replay(replay_path='./lunarlander_dqn_deploy/video') 但是之前尝试安装 libx264 库,并未解决,不知是否有解决方案?

这个问题解决了。原因是 pytorch 官方镜像中安装的 ffmpeg 版本是 4.3.0,会和镜像中默认的 libx264 库有冲突问题。使用命令conda install -c conda-forge ffmpeg==4.2.2安装低版本的 ffmpeg,然后就可以正常生成视频了。

测试确实正常了,靠谱👍