uber-research / deep-neuroevolution

Deep Neuroevolution
Other
1.63k stars 298 forks source link

[Local ES Frostbite] Value Error, Dimension Mismatch #27

Closed artofbeinghuman closed 5 years ago

artofbeinghuman commented 5 years ago

When running . scripts/local_run_exp.sh es configurations/frostbite_es.json

I get a Value Error/Dimension Mismatch. Here is the full output of the redis master:

. scripts/local_env_setup.sh                                                                                                                                                                                 [29/29]
python -m es_distributed.main master --master_socket_path /tmp/es_redis_master.sock --algo es --exp_file configurations/frostbite_es.json
marvin@mlpad:~/code/deep-neuroevolution$ . scripts/local_env_setup.sh
Setting up local environment
(env) marvin@mlpad:~/code/deep-neuroevolution$ python -m es_distributed.main master --master_socket_path /tmp/es_redis_master.sock --algo es --exp_file configurations/frostbite_es.json
[2019-06-04 18:12:12,474 pid=5654] run_master: {'master_redis_cfg': {'unix_socket_path': '/tmp/es_redis_master.sock'}, 'log_dir': '/tmp/es_master_5654', 'exp': {'config': {'calc_obstat_prob': 0.0, 'episodes_per_$
atch': 5000, 'eval_prob': 0.01, 'l2coeff': 0.005, 'noise_stdev': 0.005, 'snapshot_freq': 20, 'timesteps_per_batch': 10000, 'return_proc_mode': 'centered_rank', 'episode_cutoff_mode': 5000}, 'env_id': 'FrostbiteN$
Frameskip-v4', 'optimizer': {'args': {'stepsize': 0.01}, 'type': 'adam'}, 'policy': {'args': {}, 'type': 'ESAtariPolicy'}}}
[2019-06-04 18:12:13,773 pid=5654] Tabular logging to /tmp/es_master_5654
2019-06-04 18:12:14.669307: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-06-04 18:12:14.700165: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2496000000 Hz
2019-06-04 18:12:14.700767: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x3ad73d0 executing computations on platform Host. Devices:
2019-06-04 18:12:14.700785: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
[2019-06-04 18:12:14,711 pid=5654] From /home/marvin/code/deep-neuroevolution/env/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops
) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
[2019-06-04 18:12:14,854 pid=5654] From /home/marvin/code/deep-neuroevolution/env/lib/python3.7/site-packages/tensorflow/contrib/layers/python/layers/layers.py:1624: flatten (from tensorflow.python.layers.core) i
s deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.flatten instead.
[2019-06-04 18:12:14,939 pid=5654] From /home/marvin/code/deep-neuroevolution/env/lib/python3.7/site-packages/tensorflow/python/util/decorator_utils.py:145: GraphKeys.VARIABLES (from tensorflow.python.framework.o
ps) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.GraphKeys.GLOBAL_VARIABLES` instead.
Traceback (most recent call last):
  File "/home/marvin/code/deep-neuroevolution/env/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1659, in _create_c_op
    c_op = c_api.TF_FinishOperation(op_desc)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimension 0 in both shapes must be equal, but are 4608 and 18. Shapes are [4608] and [18].
    From merging shape 12 with other shapes. for 'concat/concat_dim' (op: 'Pack') with input shapes: [4096], [16], [16], [16], [8192], [32], [32], [32], [991232], [256], [256], [256], [4608], [18].

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/marvin/code/deep-neuroevolution/es_distributed/main.py", line 90, in <module>
    cli()
  File "/home/marvin/code/deep-neuroevolution/env/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/marvin/code/deep-neuroevolution/env/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/marvin/code/deep-neuroevolution/env/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/marvin/code/deep-neuroevolution/env/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/marvin/code/deep-neuroevolution/env/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/marvin/code/deep-neuroevolution/es_distributed/main.py", line 61, in master
    algo.run_master({'unix_socket_path': master_socket_path}, log_dir, exp)
  File "/home/marvin/code/deep-neuroevolution/es_distributed/es.py", line 147, in run_master
    config, env, sess, policy = setup(exp, single_threaded=False)
  File "/home/marvin/code/deep-neuroevolution/es_distributed/es.py", line 136, in setup
    policy = getattr(policies, exp['policy']['type'])(env.observation_space, env.action_space, **exp['policy']['args'])
  File "/home/marvin/code/deep-neuroevolution/es_distributed/policies.py", line 24, in __init__
    self._getflat = U.GetFlat(self.trainable_variables)
  File "/home/marvin/code/deep-neuroevolution/es_distributed/tf_util.py", line 244, in __init__
    self.op = tf.concat(0, [tf.reshape(v, [numel(v)]) for v in var_list])
  File "/home/marvin/code/deep-neuroevolution/env/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/home/marvin/code/deep-neuroevolution/env/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py", line 1253, in concat
    dtype=dtypes.int32).get_shape().assert_is_compatible_with(
  File "/home/marvin/code/deep-neuroevolution/env/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1039, in convert_to_tensor
    return convert_to_tensor_v2(value, dtype, preferred_dtype, name)
  File "/home/marvin/code/deep-neuroevolution/env/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1097, in convert_to_tensor_v2
    as_ref=False)
  File "/home/marvin/code/deep-neuroevolution/env/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1175, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/home/marvin/code/deep-neuroevolution/env/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py", line 1102, in _autopacking_conversion_function
    return _autopacking_helper(v, dtype, name or "packed")
  File "/home/marvin/code/deep-neuroevolution/env/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py", line 1054, in _autopacking_helper
    return gen_array_ops.pack(elems_as_tensors, name=scope)
  File "/home/marvin/code/deep-neuroevolution/env/lib/python3.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 5448, in pack
    "Pack", values=values, axis=axis, name=name)
  File "/home/marvin/code/deep-neuroevolution/env/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/home/marvin/code/deep-neuroevolution/env/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/marvin/code/deep-neuroevolution/env/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
  File "/home/marvin/code/deep-neuroevolution/env/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1823, in __init__
    control_input_ops)
  File "/home/marvin/code/deep-neuroevolution/env/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1662, in _create_c_op
    raise ValueError(str(e))
ValueError: Dimension 0 in both shapes must be equal, but are 4608 and 18. Shapes are [4608] and [18].
    From merging shape 12 with other shapes. for 'concat/concat_dim' (op: 'Pack') with input shapes: [4096], [16], [16], [16], [8192], [32], [32], [32], [991232], [256], [256], [256], [4608], [18].

How do I fix this and get the example up and running? I'd be grateful for any help!

Cheers, Marvin

artofbeinghuman commented 5 years ago

Well, turns out this error doesn't appear in tensorflow 0.12.1 (for which this code was intended anyways..) Closed.