opendilab / DI-engine

OpenDILab Decision AI Engine. The Most Comprehensive Reinforcement Learning Framework B.P.
https://di-engine-docs.readthedocs.io
Apache License 2.0
3.01k stars 368 forks source link

AttributeError: 'NoneType' object has no attribute '_replace' #522

Closed jaried closed 1 year ago

jaried commented 1 year ago
===========
Traceback (most recent call last):
  File "D:\Tony\Documents\yunpan\invest\2022\Quant\code\myclasses\factor\myfactor.py", line 745, in wrapper
    func(*args, **kw)
  File "D:\Anaconda3\lib\site-packages\ding\entry\serial_entry.py", line 92, in serial_pipeline
    random_collect(cfg.policy, policy, collector, collector_env, commander, replay_buffer)
  File "D:\Anaconda3\lib\site-packages\ding\entry\utils.py", line 62, in random_collect
    new_data = collector.collect(n_sample=policy_cfg.random_collect_size, policy_kwargs=collect_kwargs)
  File "D:\Anaconda3\lib\site-packages\ding\worker\collector\sample_serial_collector.py", line 246, in collect
    timesteps = self._env.step(actions)
  File "D:\Anaconda3\lib\site-packages\ding\envs\env_manager\subprocess_env_manager.py", line 880, in step
    timesteps[env_id] = timestep._replace(obs=self._obs_buffers[env_id].get())
AttributeError: 'NoneType' object has no attribute '_replace'

我把出错的行设置捕获异常,debug跟踪到出错的行,发现有的timestep为None,有的没有obs:

https://github.com/opendilab/DI-engine/blob/79a94bd65e2adbc6cec977ea2b11a7492b12d3e5/ding/envs/env_manager/subprocess_env_manager.py#L880

timesteps
Out[2]: 
{0: BaseEnvTimestep(obs={'agent_state': array([[0.48333332, 0.        , 0.744912  , ..., 0.        , 0.        ,
         0.        ],
        [0.48333332, 0.        , 0.744912  , ..., 0.        , 0.        ,
         0.        ],
        [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
         0.        ],
        [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
         0.        ]], dtype=float32), 'global_state': array([[0.48333332, 0.        , 0.744912  , ..., 0.        , 0.        ,
         0.        ],
        [0.48333332, 0.        , 0.744912  , ..., 0.        , 0.        ,
         0.        ],
        [0.48333332, 0.        , 0.744912  , ..., 0.        , 0.        ,
         0.        ],
        [0.48333332, 0.        , 0.744912  , ..., 0.        , 0.        ,
         0.        ]], dtype=float32)}, reward=array([-0.00114179]), done=False, info={'profit': 0.0}),
 1: BaseEnvTimestep(obs={'agent_state': array([[0.21666667, 0.20842424, 0.5341208 , ..., 0.        , 0.        ,
         0.        ],
        [0.21666667, 0.20842424, 0.5341208 , ..., 0.        , 0.20842424,
         0.        ],
        [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
         0.        ],
        [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
         0.        ]], dtype=float32), 'global_state': array([[0.21666667, 0.20842424, 0.5341208 , ..., 0.        , 0.        ,
         0.        ],
        [0.21666667, 0.20842424, 0.5341208 , ..., 0.        , 0.        ,
         0.        ],
        [0.21666667, 0.20842424, 0.5341208 , ..., 0.        , 0.        ,
         0.        ],
        [0.21666667, 0.20842424, 0.5341208 , ..., 0.        , 0.        ,
         0.        ]], dtype=float32)}, reward=array([-0.00557933]), done=False, info={'profit': -0.0002043166095582194}),
 2: BaseEnvTimestep(obs={'agent_state': array([[0.       , 0.       , 0.       , ..., 0.       , 0.       ,
         0.       ],
        [0.       , 0.       , 0.       , ..., 0.       , 0.       ,
         0.       ],
        [0.4125   , 0.       , 0.7006244, ..., 0.       , 0.       ,
         0.       ],
        [0.4125   , 0.       , 0.7006244, ..., 0.       , 0.       ,
         0.       ]], dtype=float32), 'global_state': array([[0.4125   , 0.       , 0.7006244, ..., 0.       , 0.       ,
         0.       ],
        [0.4125   , 0.       , 0.7006244, ..., 0.       , 0.       ,
         0.       ],
        [0.4125   , 0.       , 0.7006244, ..., 0.       , 0.       ,
         0.       ],
        [0.4125   , 0.       , 0.7006244, ..., 0.       , 0.       ,
         0.       ]], dtype=float32)}, reward=array([-0.00768895]), done=False, info={'profit': 0.0}),
 3: BaseEnvTimestep(obs={'agent_state': array([[0.225    , 0.       , 0.7650015, ..., 0.       , 0.       ,
         0.       ],
        [0.225    , 0.       , 0.7650015, ..., 0.       , 0.       ,
         0.       ],
        [0.       , 0.       , 0.       , ..., 0.       , 0.       ,
         0.       ],
        [0.       , 0.       , 0.       , ..., 0.       , 0.       ,
         0.       ]], dtype=float32), 'global_state': array([[0.225    , 0.       , 0.7650015, ..., 0.       , 0.       ,
         0.       ],
        [0.225    , 0.       , 0.7650015, ..., 0.       , 0.       ,
         0.       ],
        [0.225    , 0.       , 0.7650015, ..., 0.       , 0.       ,
         0.       ],
        [0.225    , 0.       , 0.7650015, ..., 0.       , 0.       ,
         0.       ]], dtype=float32)}, reward=array([-0.00081398]), done=False, info={'profit': 0.0}),
 4: None,
 5: BaseEnvTimestep(obs=None, reward=array([-0.00647803]), done=False, info={'profit': 0.0}),
 6: None,
 7: BaseEnvTimestep(obs=None, reward=array([-0.01255911]), done=False, info={'profit': 0.0}),
 8: BaseEnvTimestep(obs=None, reward=array([-0.00951866]), done=False, info={'profit': 0.0}),
 9: BaseEnvTimestep(obs=None, reward=array([-0.0369752]), done=False, info={'profit': -0.0037781147299025264})}

main_config.policy.random_collect_size改为0后,仍然有该问题,是否需要编写env.close()方法?

Out[1]: 
{0: BaseEnvTimestep(obs={'agent_state': array([[0.        , 0.        , 0.        , ..., 0.        , 0.        ,
         0.        ],
        [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
         0.        ],
        [0.60833335, 0.        , 0.74312896, ..., 0.        , 0.        ,
         0.        ],
        [0.60833335, 0.        , 0.74312896, ..., 0.        , 0.        ,
         0.        ]], dtype=float32), 'global_state': array([[0.60833335, 0.        , 0.74312896, ..., 0.        , 0.        ,
         0.        ],
        [0.60833335, 0.        , 0.74312896, ..., 0.        , 0.        ,
         0.        ],
        [0.60833335, 0.        , 0.74312896, ..., 0.        , 0.        ,
         0.        ],
        [0.60833335, 0.        , 0.74312896, ..., 0.        , 0.        ,
         0.        ]], dtype=float32)}, reward=array([-0.00736553]), done=False, info={'profit': 0.0}),
 1: None,
 2: BaseEnvTimestep(obs=None, reward=array([-0.00177073]), done=False, info={'profit': 0.0}),
 3: BaseEnvTimestep(obs=None, reward=array([-0.01418715]), done=False, info={'profit': 0.0}),
 4: BaseEnvTimestep(obs=None, reward=array([-0.00494928]), done=False, info={'profit': 0.0}),
 5: BaseEnvTimestep(obs=None, reward=array([-0.01215518]), done=False, info={'profit': 0.0}),
 6: BaseEnvTimestep(obs=None, reward=array([-0.01277187]), done=False, info={'profit': 0.0}),
 7: BaseEnvTimestep(obs=None, reward=array([-0.00631998]), done=False, info={'profit': 0.0}),
 8: BaseEnvTimestep(obs=None, reward=array([-0.00870472]), done=False, info={'profit': 0.0}),
 9: BaseEnvTimestep(obs=None, reward=array([-0.00691846]), done=False, info={'profit': 0.0})}
PaParaZz1 commented 1 year ago

这个问题是shared memory和环境格式之间不合规范产生的问题。

jaried commented 1 year ago

编写env.close()后,出现过一次,后面就没再出现,但是出现很多info,请问如何把Info信息关闭?

    def close(self):
        self._init_flag = False
        pass
PaParaZz1 commented 1 year ago

你说的info信息具体是指?

jaried commented 1 year ago

log 的info级别信息:

2022-10-25 09:11:36,972 - evaluator_logger - INFO - [EVALUATOR]env 0 finish episode, final reward: -0.11966614425182343, current episode: 1
2022-10-25 09:11:36,973 - evaluator_logger - INFO - [EVALUATOR]env 1 finish episode, final reward: -0.15683715045452118, current episode: 2
2022-10-25 09:11:36,974 - evaluator_logger - INFO - [EVALUATOR]env 2 finish episode, final reward: -0.11427276581525803, current episode: 3
2022-10-25 09:11:36,974 - evaluator_logger - INFO - [EVALUATOR]env 3 finish episode, final reward: -0.13277104496955872, current episode: 4
2022-10-25 09:11:36,974 - evaluator_logger - INFO - [EVALUATOR]env 4 finish episode, final reward: -0.2283226102590561, current episode: 5
2022-10-25 09:11:36,975 - evaluator_logger - INFO - [EVALUATOR]env 5 finish episode, final reward: -0.1947023570537567, current episode: 6
2022-10-25 09:11:36,975 - evaluator_logger - INFO - [EVALUATOR]env 6 finish episode, final reward: -0.12724675238132477, current episode: 7
2022-10-25 09:11:36,976 - evaluator_logger - INFO - [EVALUATOR]env 7 finish episode, final reward: -0.13702329993247986, current episode: 8
2022-10-25 09:11:36,976 - evaluator_logger - INFO - [EVALUATOR]env 8 finish episode, final reward: -0.10213974118232727, current episode: 9
2022-10-25 09:11:36,976 - evaluator_logger - INFO - [EVALUATOR]env 9 finish episode, final reward: -0.14240971207618713, current episode: 10
2022-10-25 09:11:36,976 - root - WARNING - VEC_ENV_MANAGER: all the not done envs are resetting, sleep 0 times
2022-10-25 09:11:45,183 - evaluator_logger - INFO - [EVALUATOR]env 7 finish episode, final reward: -0.12526081502437592, current episode: 11
2022-10-25 09:11:45,373 - evaluator_logger - INFO - [EVALUATOR]env 2 finish episode, final reward: -0.4038439989089966, current episode: 12
PaParaZz1 commented 1 year ago

创建logger的时候指定logger level就好,源代码在这里

jaried commented 1 year ago

谢谢!