Open josjo80 opened 5 years ago
cc @ericl
I found this problem mainly from enabling self.double.q
, if you set self.double.q=False
in the default config, then qmix can run.
I was able to run the RLlib's QMIX in the Starcraft2 env. However, the policy does not converge. Any suggestion is appreciated.
I forgot how to make it converge. I recommend you to use pymarl instead of rllib if you want to explore some research ideas.
I found this problem mainly from enabling
self.double.q
, if you setself.double.q=False
in the default config, then qmix can run.
Hi, where to set self.double.q=False And when I run this example , the error log is below: (RolloutWorker pid=44372) ray::RolloutWorker.init() (pid=44372, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x000002577D997BB0>) (RolloutWorker pid=44372) File "python\ray_raylet.pyx", line 658, in ray._raylet.execute_task (RolloutWorker pid=44372) File "python\ray_raylet.pyx", line 699, in ray._raylet.execute_task (RolloutWorker pid=44372) File "python\ray_raylet.pyx", line 665, in ray._raylet.execute_task (RolloutWorker pid=44372) File "python\ray_raylet.pyx", line 669, in ray._raylet.execute_task (RolloutWorker pid=44372) File "python\ray_raylet.pyx", line 616, in ray._raylet.execute_task.function_executor (RolloutWorker pid=44372) File "C:\conda\envs\smac\lib\site-packages\ray_private\function_manager.py", line 675, in actor_method_executor (RolloutWorker pid=44372) return method(__ray_actor, *args, *kwargs) (RolloutWorker pid=44372) File "C:\conda\envs\smac\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462, in _resume_span (RolloutWorker pid=44372) return method(self, _args, **_kwargs) (RolloutWorker pid=44372) File "C:\conda\envs\smac\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 511, in init (RolloutWorker pid=44372) check_env(self.env) (RolloutWorker pid=44372) File "C:\conda\envs\smac\lib\site-packages\ray\rllib\utils\pre_checks\env.py", line 78, in check_env (RolloutWorker pid=44372) raise ValueError( (RolloutWorker pid=44372) ValueError: Traceback (most recent call last): (RolloutWorker pid=44372) File "C:\conda\envs\smac\lib\site-packages\ray\rllib\utils\pre_checks\env.py", line 65, in check_env (RolloutWorker pid=44372) check_multiagent_environments(env) (RolloutWorker pid=44372) File "C:\conda\envs\smac\lib\site-packages\ray\rllib\utils\pre_checks\env.py", line 268, in check_multiagent_environments (RolloutWorker pid=44372) next_obs, reward, done, info = env.step(sampled_action) (RolloutWorker pid=44372) File "C:\conda\envs\smac\lib\site-packages\ray\rllib\env\wrappers\group_agents_wrapper.py", line 76, in step (RolloutWorker pid=44372) obs, rewards, dones, infos = self.env.step(action_dict) (RolloutWorker pid=44372) File "C:\conda\envs\smac\smac\smac\examples\rllib\env.py", line 82, in step (RolloutWorker pid=44372) raise ValueError( (RolloutWorker pid=44372) ValueError: You must supply an action for agent: 0
How to fix it , and run well? Thanks @EC2EZ4RD
@xiaoToby One key difference between the default PyMARL implementation and rllib implementation of QMIX for SMAC is that PyMARL uses the true overall global state but rllib only uses the per-agent observation as the global state in the monotonic mixing network. So you'd need to modify the default implementation and extract the true global state from the environment so that the mixing network can use it.
我忘记了如何让它收敛。如果你想探索一些研究思路,我建议你使用 pymarl 而不是 rllib。 Hi! I am currently working on this using RLlib, but I have been encountering issues with non-convergence (qmix). Could you please share any changes or adjustments you made that helped you achieve convergence? Thank you for your help!
There appears to be a problem when using a masked action space with the QMIX algorithm. I think the qmix_policy_graph expects there to be at least one valid action at all times.
Full traceback is below: