Problem with MixtureAgent and partitioned_nested_infos

I'm getting the following error when running an Exp3MixtureAgent with two agents:

 File "/Users/michalw/git/tf-agents-contextual-bandits/.venv-tf-agents-contextual-bandits/lib/python3.7/site-packages/tf_agents/agents/tf_agent.py", line 507, in train
    experience=experience, weights=weights, **kwargs)
  File "/Users/michalw/git/tf-agents-contextual-bandits/.venv-tf-agents-contextual-bandits/lib/python3.7/site-packages/tf_agents/utils/common.py", line 185, in with_check_resource_vars
    return fn(*fn_args, **fn_kwargs)
  File "/Users/michalw/git/tf-agents-contextual-bandits/.venv-tf-agents-contextual-bandits/lib/python3.7/site-packages/tf_agents/bandits/agents/mixture_agent.py", line 162, in _train
    policy_info=partitioned_nested_infos[k],
IndexError: list index out of range

The problem occurs in _train() method of MixtureAgent below:

  def _train(self, experience, weights=None):
    del weights  # unused

    reward, _ = nest_utils.flatten_multi_batched_nested_tensors(
        experience.reward, self._time_step_spec.reward)
    action, _ = nest_utils.flatten_multi_batched_nested_tensors(
        experience.action, self._action_spec)
    observation, _ = nest_utils.flatten_multi_batched_nested_tensors(
        experience.observation, self._time_step_spec.observation)
    policy_choice, _ = nest_utils.flatten_multi_batched_nested_tensors(
        experience.policy_info[mixture_policy.MIXTURE_AGENT_ID],
        self._time_step_spec.reward)
    original_infos, _ = nest_utils.flatten_multi_batched_nested_tensors(
        experience.policy_info[mixture_policy.SUBPOLICY_INFO],
        self._original_info_spec)

    partitioned_nested_infos = nest_utils.batch_nested_tensors(
        _dynamic_partition_of_nested_tensors(original_infos, policy_choice,
                                             self._num_agents))

    partitioned_nested_rewards = [
        nest_utils.batch_nested_tensors(t)
        for t in _dynamic_partition_of_nested_tensors(reward, policy_choice,
                                                      self._num_agents)
    ]
    partitioned_nested_actions = [
        nest_utils.batch_nested_tensors(t)
        for t in _dynamic_partition_of_nested_tensors(action, policy_choice,
                                                      self._num_agents)
    ]
    partitioned_nested_observations = [
        nest_utils.batch_nested_tensors(t)
        for t in _dynamic_partition_of_nested_tensors(
            observation, policy_choice, self._num_agents)
    ]
    loss = 0
    for k in range(self._num_agents):
      per_policy_experience = trajectory.single_step(
          observation=partitioned_nested_observations[k],
          action=partitioned_nested_actions[k],
          policy_info=partitioned_nested_infos[k],
          reward=partitioned_nested_rewards[k],
          discount=tf.zeros_like(partitioned_nested_rewards[k]))
      loss_info = self._agents[k].train(per_policy_experience)
      loss += loss_info.loss
    common.function_in_tf1()(self._update_mixture_distribution)(experience)
    return tf_agent.LossInfo(loss=(loss), extra=())

The problem apparently occurs because partitioned_nested_infos is an empty list (the other lists in the loop are fine, I checked that). The thing is that I'm not using any info (i.e. just the default one) for all the sub-policies and original_infos above is a PolicyInfo with the default parameters, i.e PolicyInfo(log_probability=(), predicted_rewards_mean=(), predicted_rewards_optimistic=(), predicted_rewards_sampled=(), bandit_policy_type=()) (single object, not a list). When I set policy_info=original_infos in the loop at the end of the method, everything seems to be working fine.

Is it a bug or am I doing something wrong?

Thanks so much for your help!

tensorflow / agents

Problem with MixtureAgent and partitioned_nested_infos #488