opendilab / DI-engine

OpenDILab Decision AI Engine
https://di-engine-docs.readthedocs.io
Apache License 2.0
2.79k stars 348 forks source link

bug when running MARL algorithm Qmix in pettingzoo #798

Closed cymmerida123 closed 1 month ago

cymmerida123 commented 1 month ago

if name == 'main':

or you can enter ding -m serial -c ptz_simple_spread_qmix_config.py -s 0

from ding.entry import serial_pipeline
serial_pipeline((main_config, create_config), seed=0)
- info
```python
INFO     subprocess exception traceback:               subprocess_env_manager.py:308
Traceback (most recent call last):
  File"D:\毕设\Code\DI-engine\ding\envs\env_manager\subprocess_env_manager.py", line 305, in  _reset
    reset_fn()
  File"D:\毕设\Code\DI-engine\ding\envs\env_manager\subprocess_env_manager.py", line 291, in reset_fn
    raise ConnectionError("env reset connection timeout")  # Leave it to try again 
ConnectionError: env reset connection timeout 

Expected args are: FullArgSpec(args=['self', 'cfg', 'env', 'policy', 'tb_logger', 'exp_name', 'instance_name'], varargs=None, varkw=None, defaults=(None, None, None, 'default_experiment', 'collector'), kwonlyargs=[], kwonlydefaults=None, annotations={'return': None, 'cfg': <class 'easydict.EasyDict'>, 'env': <class 'ding.envs.env_manager.base_env_manager.BaseEnvManager'>, 'policy': <function namedtuple at 0x00000207DF0A9820>, 'tb_logger': 'SummaryWriter', 'exp_name': typing.Union[str, NoneType], 'instance_name': typing.Union[str, NoneType]}) Given arguments keys are: dict_keys(['cfg', 'env', 'policy', 'tb_logger', 'exp_name'])

PaParaZz1 commented 1 month ago

It seems that this bug is due to the timeout of the reset operation in the pettingzoo environment. Could you provide the exact version of pettingzoo in your PC (the version we commonly use is 1.22.4). Additionally, you can modify the configuration settings, such as changing the environment manager type from subprocess to base. This adjustment can facilitate the debugging process for issues pertaining to the environment.

cymmerida123 commented 1 month ago

My version is 1.24.3. When I change it to 1.22.4, everything seems to be OK. But is it normal for training with the QMIX algorithm in a ptz_simple_spread scenario to take over an hour without finishing?

PaParaZz1 commented 1 month ago

My version is 1.24.3. When I change it to 1.22.4, everything seems to be OK. But is it normal for training with the QMIX algorithm in a ptz_simple_spread scenario to take over an hour without finishing?

You can compare your result with the benchmark result shown in this doc. If you encounter any further problems or have additional questions, please feel free to continue the discussion within this issue.