opendilab / DI-engine

OpenDILab Decision AI Engine. The Most Comprehensive Reinforcement Learning Framework B.P.
https://di-engine-docs.readthedocs.io
Apache License 2.0
3.03k stars 370 forks source link

Question about MBPO. #681

Closed Uptomylimit closed 1 year ago

Uptomylimit commented 1 year ago

I am tring to use DI-engine to realize mbpo. But I get a bug.

import ding
import gym
import os
from tensorboardX import SummaryWriter
# from dizoo.classic_control.cartpole.config.cartpole_dqn_config import main_config, create_config
from dizoo.classic_control.pendulum.config.mbrl.pendulum_sac_mbpo_config import main_config,create_config
from ding.config import compile_config
from ding.envs import DingEnvWrapper, BaseEnvManagerV2
from ding.world_model.mbpo import MBPOWorldModel
from ding.model import QAC
# from ding.model import DQN
from ding.policy.mbpolicy import MBSACPolicy
from ding.data import DequeBuffer
from ding.framework import task
from ding.framework.context import OnlineRLContext
from ding.framework.middleware import OffPolicyLearner, StepCollector, interaction_evaluator, data_pusher, eps_greedy_handler, CkptSaver
import logging
logging.getLogger().setLevel(logging.INFO)

if __name__ == '__main__':
    cfg = compile_config(main_config, create_cfg=create_config, auto=True)
    collector_env = BaseEnvManagerV2(
        env_fn=[lambda: DingEnvWrapper(gym.make("CartPole-v1")) for _ in range(cfg.env.collector_env_num)],
        cfg=cfg.env.manager
    )
    evaluator_env = BaseEnvManagerV2(
        env_fn=[lambda: DingEnvWrapper(gym.make("CartPole-v1")) for _ in range(cfg.env.evaluator_env_num)],
        cfg=cfg.env.manager
    )

    tb_logger = SummaryWriter(os.path.join('./{}/log/'.format(cfg.exp_name), 'serial'))
    world_model = MBPOWorldModel(cfg.world_model,collector_env,tb_logger)
    model = QAC(**cfg.policy.model)
    buffer_ = DequeBuffer(size=cfg.policy.other.replay_buffer.replay_buffer_size)
    policy = MBSACPolicy(cfg.policy,model)

The error is as follow:

Traceback (most recent call last):
  File "D:\project\pythonproject\MBPO\main.py", line 38, in <module>
    policy = MBSACPolicy(cfg.policy,model)
  File "C:\Users\30795\.virtualenvs\MBPO-0tBDzUMZ\lib\site-packages\ding\policy\base_policy.py", line 114, in __init__
    getattr(self, '_init_' + field)()
  File "C:\Users\30795\.virtualenvs\MBPO-0tBDzUMZ\lib\site-packages\ding\policy\mbpolicy\mbsac.py", line 59, in _init_learn
    self._lambda = self._cfg.learn.lambda_
AttributeError: 'EasyDict' object has no attribute 'lambda_'

Could you tell me how to use your model based reinforcement learning method like mbpo? IS there a example about model based method?

PaParaZz1 commented 1 year ago

You can refer to this config for the runnable mbpo example. If other questions, you can continue to ask questions in this issue.

Uptomylimit commented 1 year ago

I am using this config "dizoo.classic_control.pendulum.config.mbrl.pendulum_sac_mbpo_config". Is this config a runnable example? I am confused about the structure of using model based RL. I find that you have many examples in "./ding/example" about model free RL, like sac,ppo,dqn,etc. But there is no example of mbrl. So I am confused about the structure of using model based RL. Is there a example about model based method?

Uptomylimit commented 1 year ago

I use the MBSACPolicy class in "./ding/policy/mbpolicy". But I find that there is error like this:

'EasyDict' object has no attribute 'lambda_'

Which means that the config is not complete. The config you present doesn't include the setting "grad_clip".I think that it maybe not complete. So I add these settings. But I am still confused of the structure of using model based RL. How to use the world model? How to use the function "task.use(StepCollector)"?

PaParaZz1 commented 1 year ago

I am using this config "dizoo.classic_control.pendulum.config.mbrl.pendulum_sac_mbpo_config". Is this config a runnable example? I am confused about the structure of using model based RL. I find that you have many examples in "./ding/example" about model free RL, like sac,ppo,dqn,etc. But there is no example of mbrl. So I am confused about the structure of using model based RL. Is there a example about model based method?

You can directly execute python3 pendulum_sac_mbpo_config.py to run this config file. This file will call serial_pipeline_dyna method to launch the training problem.

Uptomylimit commented 1 year ago

Thanks for your answer! The code works now!