RLLIB QMIX example does not work

josjo80 commented 5 years ago

There appears to be a problem when using a masked action space with the QMIX algorithm. I think the qmix_policy_graph expects there to be at least one valid action at all times.

Full traceback is below:


  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 446, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 316, in fetch_result
    result = ray.get(trial_future[0])
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/worker.py", line 2197, in get
    raise value
ray.exceptions.RayTaskError: ^[[36mray_QMixTrainer:train()^[[39m (pid=25398, host=cassini)
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 354, in train
    raise e
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 340, in train
    result = Trainable.train(self)
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/tune/trainable.py", line 151, in train
    result = self._train()
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/rllib/agents/dqn/dqn.py", line 242, in _train
    self.optimizer.step()
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/rllib/optimizers/sync_batch_replay_optimizer.py", line 84, in step
    return self._optimize()
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/rllib/optimizers/sync_batch_replay_optimizer.py", line 108, in _optimize
    info_dict = self.local_evaluator.learn_on_batch(samples)
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/rllib/evaluation/policy_evaluator.py", line 581, in learn_on_batch
    info_out[pid] = policy.learn_on_batch(batch)
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/rllib/agents/qmix/qmix_policy_graph.py", line 296, in learn_on_batch
    next_obs, action_mask, next_action_mask)
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/rllib/agents/qmix/qmix_policy_graph.py", line 108, in forward
    there may be a state with no valid actions."
AssertionError: target_max_qvals contains a masked action;             there may be a state with no valid actions.```

richardliaw commented 4 years ago

cc @ericl

EC2EZ4RD commented 4 years ago

I found this problem mainly from enabling self.double.q, if you set self.double.q=False in the default config, then qmix can run.

plutonic88 commented 3 years ago

I was able to run the RLlib's QMIX in the Starcraft2 env. However, the policy does not converge. Any suggestion is appreciated.

EC2EZ4RD commented 3 years ago

I forgot how to make it converge. I recommend you to use pymarl instead of rllib if you want to explore some research ideas.

xiaoToby commented 1 year ago

I found this problem mainly from enabling self.double.q, if you set self.double.q=False in the default config, then qmix can run.

Hi, where to set self.double.q=False And when I run this example , the error log is below: (RolloutWorker pid=44372) ray::RolloutWorker.init() (pid=44372, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x000002577D997BB0>) (RolloutWorker pid=44372) File "python\ray_raylet.pyx", line 658, in ray._raylet.execute_task (RolloutWorker pid=44372) File "python\ray_raylet.pyx", line 699, in ray._raylet.execute_task (RolloutWorker pid=44372) File "python\ray_raylet.pyx", line 665, in ray._raylet.execute_task (RolloutWorker pid=44372) File "python\ray_raylet.pyx", line 669, in ray._raylet.execute_task (RolloutWorker pid=44372) File "python\ray_raylet.pyx", line 616, in ray._raylet.execute_task.function_executor (RolloutWorker pid=44372) File "C:\conda\envs\smac\lib\site-packages\ray_private\function_manager.py", line 675, in actor_method_executor (RolloutWorker pid=44372) return method(__ray_actor, *args, *kwargs) (RolloutWorker pid=44372) File "C:\conda\envs\smac\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462, in _resume_span (RolloutWorker pid=44372) return method(self, _args, **_kwargs) (RolloutWorker pid=44372) File "C:\conda\envs\smac\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 511, in init (RolloutWorker pid=44372) check_env(self.env) (RolloutWorker pid=44372) File "C:\conda\envs\smac\lib\site-packages\ray\rllib\utils\pre_checks\env.py", line 78, in check_env (RolloutWorker pid=44372) raise ValueError( (RolloutWorker pid=44372) ValueError: Traceback (most recent call last): (RolloutWorker pid=44372) File "C:\conda\envs\smac\lib\site-packages\ray\rllib\utils\pre_checks\env.py", line 65, in check_env (RolloutWorker pid=44372) check_multiagent_environments(env) (RolloutWorker pid=44372) File "C:\conda\envs\smac\lib\site-packages\ray\rllib\utils\pre_checks\env.py", line 268, in check_multiagent_environments (RolloutWorker pid=44372) next_obs, reward, done, info = env.step(sampled_action) (RolloutWorker pid=44372) File "C:\conda\envs\smac\lib\site-packages\ray\rllib\env\wrappers\group_agents_wrapper.py", line 76, in step (RolloutWorker pid=44372) obs, rewards, dones, infos = self.env.step(action_dict) (RolloutWorker pid=44372) File "C:\conda\envs\smac\smac\smac\examples\rllib\env.py", line 82, in step (RolloutWorker pid=44372) raise ValueError( (RolloutWorker pid=44372) ValueError: You must supply an action for agent: 0

How to fix it , and run well? Thanks @EC2EZ4RD

MichaelXCChen commented 1 year ago

@xiaoToby One key difference between the default PyMARL implementation and rllib implementation of QMIX for SMAC is that PyMARL uses the true overall global state but rllib only uses the per-agent observation as the global state in the monotonic mixing network. So you'd need to modify the default implementation and extract the true global state from the environment so that the mixing network can use it.

hawksam commented 1 month ago

我忘记了如何让它收敛。如果你想探索一些研究思路，我建议你使用 pymarl 而不是 rllib。 Hi! I am currently working on this using RLlib, but I have been encountering issues with non-convergence (qmix). Could you please share any changes or adjustments you made that helped you achieve convergence? Thank you for your help!

oxwhirl / smac

RLLIB QMIX example does not work #12