cymmerida123 commented 1 month ago

[ ] I have marked all applicable categories:
- [ ] exception-raising bug
- [ ] RL algorithm bug
- [x] system worker bug
- [ ] system utils bug
- [ ] code design/refactor
- [ ] documentation request
- [ ] new feature request
[x] I have visited the readme and doc
[x] I have searched through the issue tracker and pr tracker

[x] I have mentioned version numbers, operating system and environment, where applicable:

import ding, torch, sys
print(ding.__version__, torch.__version__, sys.version, sys.platform)
v0.5.1 2.3.0+cu121 3.8.19 (default, Mar 20 2024, 19:55:45) [MSC v.1916 64 bit (AMD64)] win32

code


from dizoo.petting_zoo.config.ptz_simple_spread_qmix_config import main_config, create_config # 载入DI-zoo lunarlander 环境与 DQN 算法相关配置

if name == 'main':

or you can enter `ding -m serial -c ptz_simple_spread_qmix_config.py -s 0`

from ding.entry import serial_pipeline
serial_pipeline((main_config, create_config), seed=0)

- info
```python
INFO     subprocess exception traceback:               subprocess_env_manager.py:308
Traceback (most recent call last):
  File"D:\毕设\Code\DI-engine\ding\envs\env_manager\subprocess_env_manager.py", line 305, in  _reset
    reset_fn()
  File"D:\毕设\Code\DI-engine\ding\envs\env_manager\subprocess_env_manager.py", line 291, in reset_fn
    raise ConnectionError("env reset connection timeout")  # Leave it to try again 
ConnectionError: env reset connection timeout

error


[05-29 11:36:56] ERROR    Env 3 reset has exceeded max retries(1)       subprocess_env_manager.py:317
[05-29 11:36:56] ERROR    Env 5 reset has exceeded max retries(1)       subprocess_env_manager.py:317
[05-29 11:36:56] ERROR    Env 6 reset has exceeded max retries(1)       subprocess_env_manager.py:317
[05-29 11:36:56] ERROR    Env 7 reset has exceeded max retries(1)       subprocess_env_manager.py:317
[05-29 11:36:56] ERROR    Env 1 reset has exceeded max retries(1)       subprocess_env_manager.py:317
[05-29 11:36:56] ERROR    Env 4 reset has exceeded max retries(1)       subprocess_env_manager.py:317
[05-29 11:36:56] ERROR    Env 0 reset has exceeded max retries(1)       subprocess_env_manager.py:317
[05-29 11:36:56] ERROR    Env 2 reset has exceeded max retries(1)       subprocess_env_manager.py:317
Hint: for <class 'ding.worker.collector.sample_serial_collector.SampleSerialCollector'>(alias=sample)

Expected args are: FullArgSpec(args=['self', 'cfg', 'env', 'policy', 'tb_logger', 'exp_name', 'instance_name'], varargs=None, varkw=None, defaults=(None, None, None, 'default_experiment', 'collector'), kwonlyargs=[], kwonlydefaults=None, annotations={'return': None, 'cfg': <class 'easydict.EasyDict'>, 'env': <class 'ding.envs.env_manager.base_env_manager.BaseEnvManager'>, 'policy': <function namedtuple at 0x00000207DF0A9820>, 'tb_logger': 'SummaryWriter', 'exp_name': typing.Union[str, NoneType], 'instance_name': typing.Union[str, NoneType]}) Given arguments keys are: dict_keys(['cfg', 'env', 'policy', 'tb_logger', 'exp_name'])

PaParaZz1 commented 1 month ago

It seems that this bug is due to the timeout of the reset operation in the pettingzoo environment. Could you provide the exact version of pettingzoo in your PC (the version we commonly use is 1.22.4). Additionally, you can modify the configuration settings, such as changing the environment manager type from subprocess to base. This adjustment can facilitate the debugging process for issues pertaining to the environment.

cymmerida123 commented 1 month ago

My version is 1.24.3. When I change it to 1.22.4, everything seems to be OK. But is it normal for training with the QMIX algorithm in a ptz_simple_spread scenario to take over an hour without finishing?

PaParaZz1 commented 1 month ago

My version is 1.24.3. When I change it to 1.22.4, everything seems to be OK. But is it normal for training with the QMIX algorithm in a ptz_simple_spread scenario to take over an hour without finishing?

You can compare your result with the benchmark result shown in this doc. If you encounter any further problems or have additional questions, please feel free to continue the discussion within this issue.

opendilab / DI-engine

bug when running MARL algorithm Qmix in pettingzoo #798

or you can enter `ding -m serial -c ptz_simple_spread_qmix_config.py -s 0`

opendilab / DI-engine

bug when running MARL algorithm Qmix in pettingzoo #798

or you can enter ding -m serial -c ptz_simple_spread_qmix_config.py -s 0

or you can enter `ding -m serial -c ptz_simple_spread_qmix_config.py -s 0`