tensorflow / agents

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.
Apache License 2.0
2.81k stars 720 forks source link

Problem with MixtureAgent and partitioned_nested_infos #488

Closed waral closed 4 years ago

waral commented 4 years ago

I'm getting the following error when running an Exp3MixtureAgent with two agents:

 File "/Users/michalw/git/tf-agents-contextual-bandits/.venv-tf-agents-contextual-bandits/lib/python3.7/site-packages/tf_agents/agents/tf_agent.py", line 507, in train
    experience=experience, weights=weights, **kwargs)
  File "/Users/michalw/git/tf-agents-contextual-bandits/.venv-tf-agents-contextual-bandits/lib/python3.7/site-packages/tf_agents/utils/common.py", line 185, in with_check_resource_vars
    return fn(*fn_args, **fn_kwargs)
  File "/Users/michalw/git/tf-agents-contextual-bandits/.venv-tf-agents-contextual-bandits/lib/python3.7/site-packages/tf_agents/bandits/agents/mixture_agent.py", line 162, in _train
    policy_info=partitioned_nested_infos[k],
IndexError: list index out of range

The problem occurs in _train() method of MixtureAgent below:

  def _train(self, experience, weights=None):
    del weights  # unused

    reward, _ = nest_utils.flatten_multi_batched_nested_tensors(
        experience.reward, self._time_step_spec.reward)
    action, _ = nest_utils.flatten_multi_batched_nested_tensors(
        experience.action, self._action_spec)
    observation, _ = nest_utils.flatten_multi_batched_nested_tensors(
        experience.observation, self._time_step_spec.observation)
    policy_choice, _ = nest_utils.flatten_multi_batched_nested_tensors(
        experience.policy_info[mixture_policy.MIXTURE_AGENT_ID],
        self._time_step_spec.reward)
    original_infos, _ = nest_utils.flatten_multi_batched_nested_tensors(
        experience.policy_info[mixture_policy.SUBPOLICY_INFO],
        self._original_info_spec)

    partitioned_nested_infos = nest_utils.batch_nested_tensors(
        _dynamic_partition_of_nested_tensors(original_infos, policy_choice,
                                             self._num_agents))

    partitioned_nested_rewards = [
        nest_utils.batch_nested_tensors(t)
        for t in _dynamic_partition_of_nested_tensors(reward, policy_choice,
                                                      self._num_agents)
    ]
    partitioned_nested_actions = [
        nest_utils.batch_nested_tensors(t)
        for t in _dynamic_partition_of_nested_tensors(action, policy_choice,
                                                      self._num_agents)
    ]
    partitioned_nested_observations = [
        nest_utils.batch_nested_tensors(t)
        for t in _dynamic_partition_of_nested_tensors(
            observation, policy_choice, self._num_agents)
    ]
    loss = 0
    for k in range(self._num_agents):
      per_policy_experience = trajectory.single_step(
          observation=partitioned_nested_observations[k],
          action=partitioned_nested_actions[k],
          policy_info=partitioned_nested_infos[k],
          reward=partitioned_nested_rewards[k],
          discount=tf.zeros_like(partitioned_nested_rewards[k]))
      loss_info = self._agents[k].train(per_policy_experience)
      loss += loss_info.loss
    common.function_in_tf1()(self._update_mixture_distribution)(experience)
    return tf_agent.LossInfo(loss=(loss), extra=())

The problem apparently occurs because partitioned_nested_infos is an empty list (the other lists in the loop are fine, I checked that). The thing is that I'm not using any info (i.e. just the default one) for all the sub-policies and original_infos above is a PolicyInfo with the default parameters, i.e PolicyInfo(log_probability=(), predicted_rewards_mean=(), predicted_rewards_optimistic=(), predicted_rewards_sampled=(), bandit_policy_type=()) (single object, not a list). When I set policy_info=original_infos in the loop at the end of the method, everything seems to be working fine.

Is it a bug or am I doing something wrong?

Thanks so much for your help!

bartokg commented 4 years ago

Hey Michal, Thanks for reporting! This was indeed a bug, the function _dynamic_partition_of_nested_tensors in mixture_agent.py did not handle empty nests correctly, and returned an empty list instead of a list of empty nests. This change: https://github.com/tensorflow/agents/commit/4f05181bf10453073d97d35cbe20dc995e988cdb should fix it.