tensorflow / agents

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.
Apache License 2.0
2.81k stars 720 forks source link

Multiple actions for PPOAgent #759

Open DavyMorgan opened 2 years ago

DavyMorgan commented 2 years ago

Hi, I developed a environment with action_spec as : BoundedTensorSpec(shape=(2,), dtype=tf.int32, name='action', minimum=array(0, dtype=int32), maximum=array(65535, dtype=int32))

Since the two actions are independent, to obtain the action, I use tfp.Independent to generate the action distribution: self._output_dist_spec = distribution_spec.DistributionSpec( tfp.distributions.Independent, input_param_spec, sample_spec=output_tensors_spec, reinterpreted_batch_ndims=1 ) output_dist = self._output_dist_spec.build_distribution( distribution=tfp.distributions.Categorical(logits=tf.concat([masked_logits, masked_logits], axis=-2)))

where masked_logits is of shape (1, 65536)

However, I met the following errors: Traceback (most recent call last): File "/usr/local/anaconda/envs/tf-agents/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/local/anaconda/envs/tf-agents/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/mas/zhengyu/workspace/urban_training/urban_training/learning/train_ppo.py", line 126, in multiprocessing.handle_main(functools.partial(app.run, main)) File "/usr/local/anaconda/envs/tf-agents/lib/python3.7/site-packages/tf_agents/system/default/multiprocessing_core.py", line 77, in handle_main return app.run(parent_main_fn, *args, kwargs) File "/usr/local/anaconda/envs/tf-agents/lib/python3.7/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/usr/local/anaconda/envs/tf-agents/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/usr/local/anaconda/envs/tf-agents/lib/python3.7/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/usr/local/anaconda/envs/tf-agents/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/home/mas/zhengyu/workspace/urban_training/urban_training/learning/train_ppo.py", line 117, in main allow_variable_length_episodes=_ALLOW_VARIABLE_LENGTH_EPISODES.value) File "/home/mas/zhengyu/workspace/urban_training/urban_training/learning/train_ppo_lib.py", line 107, in train interval=num_episodes_per_iteration) File "/usr/local/anaconda/envs/tf-agents/lib/python3.7/site-packages/tf_agents/train/triggers.py", line 133, in init use_nest_path_signatures) File "/usr/local/anaconda/envs/tf-agents/lib/python3.7/site-packages/tf_agents/train/triggers.py", line 174, in _build_saver input_fn_and_spec=self._input_fn_and_spec, File "/usr/local/anaconda/envs/tf-agents/lib/python3.7/site-packages/tf_agents/policies/policy_saver.py", line 385, in init policy_state=batched_policy_state_spec) File "/usr/local/anaconda/envs/tf-agents/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 1239, in get_concrete_function concrete = self._get_concrete_function_garbage_collected(*args, *kwargs) File "/usr/local/anaconda/envs/tf-agents/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 1219, in _get_concrete_function_garbage_collected self._initialize(args, kwargs, add_initializers_to=initializers) File "/usr/local/anaconda/envs/tf-agents/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 786, in _initialize args, kwds)) File "/usr/local/anaconda/envs/tf-agents/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2520, in _get_concrete_function_internal_garbage_collected graphfunction, = self._maybe_define_function(args, kwargs) File "/usr/local/anaconda/envs/tf-agents/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2751, in _maybe_define_function graph_function = self._create_graph_function(args, kwargs) File "/usr/local/anaconda/envs/tf-agents/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2676, in _create_graph_function capture_by_value=self._capture_by_value), File "/usr/local/anaconda/envs/tf-agents/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 1141, in func_graph_from_py_func func_outputs = python_func(*func_args, *func_kwargs) File "/usr/local/anaconda/envs/tf-agents/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 677, in wrapped_fn out = weak_wrapped_fn().wrapped(args, **kwds) File "/usr/local/anaconda/envs/tf-agents/lib/python3.7/site-packages/tf_agents/policies/policy_saver.py", line 381, in polymorphic_action_fn return action_fn(time_step, policy_state) File "/usr/local/anaconda/envs/tf-agents/lib/python3.7/site-packages/tf_agents/policies/tf_policy.py", line 346, in action message='action output and policy_step_spec structures do not match') File "/usr/local/anaconda/envs/tf-agents/lib/python3.7/site-packages/tf_agents/utils/nest_utils.py", line 125, in assert_same_structure .format(message, str1, str2, nest1, nest2)) ValueError: action output and policy_step_spec structures do not match: PolicyStep(action=., state=(), info={'dist_params': {}, 'value_prediction': .}) vs. PolicyStep(action=., state=(), info=DictWrapper({'dist_params': DictWrapper({'logits': .}), 'value_prediction': .})) Values: PolicyStep(action=<tf.Tensor 'IndependentCategorical_CONSTRUCTED_AT_RLPolicyModel/sample/Reshape:0' shape=(None,) dtype=int32>, state=(), info={'dist_params': {}, 'value_prediction': <tf.Tensor 'rl_value_model/Squeeze:0' shape=(None,) dtype=float32>}) vs. PolicyStep(action=BoundedTensorSpec(shape=(2,), dtype=tf.int32, name='action', minimum=array(0, dtype=int32), maximum=array(65535, dtype=int32)), state=(), info=DictWrapper({'dist_params': DictWrapper({'logits': TensorSpec(shape=(65536,), dtype=tf.float32, name='RLPolicyModel_logits')}), 'value_prediction': TensorSpec(shape=(), dtype=tf.float32, name=None)}))

It seems ther error comes from the following lines in tf_policy.py if self._validate_args: nest_utils.assert_same_structure( step, self._policy_step_spec, message='action output and policy_step_spec structures do not match') which is invoked by the policy_saver in PolicySavedModelTrigger

Is it a bug?

sguada commented 2 years ago

The specs don't match PolicyStep(action=., state=(), info={'dist_params': {}, 'value_prediction': .}) vs. PolicyStep(action=., state=(), info=DictWrapper({'dist_params': DictWrapper({'logits': .}), 'value_prediction': .}))

Make sure the policy builds the correct info data.

DavyMorgan commented 2 years ago

Thanks. My question is actually that can PPOAgent support multiple discrete actions? I found no tutorials about this topic.

DavyMorgan commented 2 years ago

The specs don't match PolicyStep(action=., state=(), info={'dist_params': {}, 'value_prediction': .}) vs. PolicyStep(action=., state=(), info=DictWrapper({'dist_params': DictWrapper({'logits': .}), 'value_prediction': .}))

Make sure the policy builds the correct info data.

@sguada Does the PPOAgent in tf-agents only support 1-D action?

sibyjackgrove commented 2 years ago

I believe this issue is related to #720 which I had raised earlier. Unfortunately, still haven't been able to find a solution. This pull request #705 could potentially address this.

sibyjackgrove commented 1 year ago

@DavyMorgan Here is one solution to this issue. https://github.com/tensorflow/agents/issues/656#issuecomment-1286095054