openai / maddpg

Code for the MADDPG algorithm from the paper "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments"
https://arxiv.org/pdf/1706.02275.pdf
MIT License
1.59k stars 484 forks source link

All scenario will fail but except scenario simple #63

Closed EvaluationResearch closed 3 years ago

EvaluationResearch commented 3 years ago

Hi Team, Could you please help take a look at the error: python train.py --scenario simple --display is ok

But "python train.py --scenario simple_speaker_listener --display" will run into the error: During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/zhaoyue/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1286, in restore {self.saver_def.filename_tensor_name: save_path}) File "/home/zhaoyue/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 950, in run run_metadata_ptr) File "/home/zhaoyue/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1173, in _run feed_dict_tensor, options, run_metadata) File "/home/zhaoyue/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run run_metadata) File "/home/zhaoyue/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.NotFoundError: Key agent_1/agent_1/p_func/fully_connected/biases/Adam not found in checkpoint [[node save/RestoreV2 (defined at /home/zhaoyue/maddpg/maddpg/common/tf_util.py:229) ]]

Original stack trace for 'save/RestoreV2': File "train.py", line 193, in train(arglist) File "train.py", line 96, in train U.load_state(arglist.load_dir) File "/home/zhaoyue/maddpg/maddpg/common/tf_util.py", line 229, in load_state saver = tf.train.Saver() File "/home/zhaoyue/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 825, in init self.build() File "/home/zhaoyue/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 837, in build self._build(self._filename, build_save=True, build_restore=True) File "/home/zhaoyue/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 875, in _build build_restore=build_restore) File "/home/zhaoyue/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 508, in _build_internal restore_sequentially, reshape) File "/home/zhaoyue/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 328, in _AddRestoreOps restore_sequentially) File "/home/zhaoyue/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 575, in bulk_restore return io_ops.restore_v2(filename_tensor, names, slices, dtypes) File "/home/zhaoyue/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1696, in restore_v2 name=name) File "/home/zhaoyue/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "/home/zhaoyue/.local/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(*args, **kwargs) File "/home/zhaoyue/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op op_def=op_def) File "/home/zhaoyue/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2005, in init self._traceback = tf_stack.extract_stack()

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/zhaoyue/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1296, in restore names_to_keys = object_graph_key_mapping(save_path) File "/home/zhaoyue/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1614, in object_graph_key_mapping object_graph_string = reader.get_tensor(trackable.OBJECT_GRAPH_PROTO_KEY) File "/home/zhaoyue/.local/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 678, in get_tensor return CheckpointReader_GetTensor(self, compat.as_bytes(tensor_str)) tensorflow.python.framework.errors_impl.NotFoundError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "train.py", line 193, in train(arglist) File "train.py", line 96, in train U.load_state(arglist.load_dir) File "/home/zhaoyue/maddpg/maddpg/common/tf_util.py", line 230, in load_state saver.restore(get_session(), fname) File "/home/zhaoyue/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1302, in restore err, "a Variable name or other graph key that is missing") tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key agent_1/agent_1/p_func/fully_connected/biases/Adam not found in checkpoint [[node save/RestoreV2 (defined at /home/zhaoyue/maddpg/maddpg/common/tf_util.py:229) ]]

Original stack trace for 'save/RestoreV2': File "train.py", line 193, in train(arglist) File "train.py", line 96, in train U.load_state(arglist.load_dir) File "/home/zhaoyue/maddpg/maddpg/common/tf_util.py", line 229, in load_state saver = tf.train.Saver() File "/home/zhaoyue/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 825, in init self.build() File "/home/zhaoyue/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 837, in build self._build(self._filename, build_save=True, build_restore=True) File "/home/zhaoyue/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 875, in _build build_restore=build_restore) File "/home/zhaoyue/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 508, in _build_internal restore_sequentially, reshape) File "/home/zhaoyue/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 328, in _AddRestoreOps restore_sequentially) File "/home/zhaoyue/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 575, in bulk_restore return io_ops.restore_v2(filename_tensor, names, slices, dtypes) File "/home/zhaoyue/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1696, in restore_v2 name=name) File "/home/zhaoyue/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "/home/zhaoyue/.local/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(*args, **kwargs) File "/home/zhaoyue/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op op_def=op_def) File "/home/zhaoyue/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2005, in init self._traceback = tf_stack.extract_stack()

Your early reply will be highly appreciated.

EvaluationResearch commented 3 years ago

hello, I have some blogs said that /tmp/policy folder should be delete if you want to change a environment. when i delete the policy file , (i noticed there is a checkpoint in policy), the following error have come : WARNING:tensorflow:From /home/zhaoyue/maddpg/venv/lib/python3.5/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. Traceback (most recent call last): File "/home/zhaoyue/maddpg/experiments/train.py", line 193, in train(arglist) File "/home/zhaoyue/maddpg/experiments/train.py", line 96, in train U.load_state(arglist.load_dir) File "/home/zhaoyue/maddpg/maddpg/common/tf_util.py", line 230, in load_state saver.restore(get_session(), fname) File "/home/zhaoyue/maddpg/venv/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1276, in restore if not checkpoint_management.checkpoint_exists(compat.as_text(save_path)): File "/home/zhaoyue/maddpg/venv/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 324, in new_func return func(*args, kwargs) File "/home/zhaoyue/maddpg/venv/lib/python3.5/site-packages/tensorflow/python/training/checkpoint_management.py", line 372, in checkpoint_exists if file_io.get_matching_files(pathname): File "/home/zhaoyue/maddpg/venv/lib/python3.5/site-packages/tensorflow/python/lib/io/file_io.py", line 363, in get_matching_files return get_matching_files_v2(filename) File "/home/zhaoyue/maddpg/venv/lib/python3.5/site-packages/tensorflow/python/lib/io/file_io.py", line 384, in get_matching_files_v2 compat.as_bytes(pattern)) tensorflow.python.framework.errors_impl.NotFoundError: /tmp/policy; No such file or directory**

Process finished with exit code 1

but when i mkdir a new policy , there is also a error: WARNING:tensorflow:From /home/zhaoyue/maddpg/maddpg/common/tf_util.py:229: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

WARNING:tensorflow:From /home/zhaoyue/maddpg/venv/lib/python3.5/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. Traceback (most recent call last): File "/home/zhaoyue/maddpg/experiments/train.py", line 193, in train(arglist) File "/home/zhaoyue/maddpg/experiments/train.py", line 96, in train U.load_state(arglist.load_dir) File "/home/zhaoyue/maddpg/maddpg/common/tf_util.py", line 230, in load_state saver.restore(get_session(), fname) File "/home/zhaoyue/maddpg/venv/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1278, in restore compat.as_text(save_path)) ValueError: The passed save_path is not a valid checkpoint: /tmp/policy/

Process finished with exit code 1