Error using parameter train_step_counter according to colab example

ideenfix commented 5 years ago

I'm using TF Agent (nightly, 0.2.0dev2019430 on Win10 and TF2.0 (GPU, 2.0.0a0).

If you run the snippet according to colab example

` train_step_counter = tf.Variable(0)

tf_agent = dqn_agent.DqnAgent(
    train_env.time_step_spec(),
    train_env.action_spec(),
    q_network=net,
    optimizer=optimizer,
    epsilon_greedy=params["epsilon_final"],
    gamma=params['gamma'],
    td_errors_loss_fn=dqn_agent.element_wise_squared_loss,
    train_step_counter=train_step_counter
)

`

After calling DqnAgent.train following error is thrown

Traceback (most recent call last): File "C:\Program Files\JetBrains\PyCharm 2019.1\helpers\pydev\pydevd.py", line 1741, in main() File "C:\Program Files\JetBrains\PyCharm 2019.1\helpers\pydev\pydevd.py", line 1735, in main globals = debugger.run(setup['file'], None, None, is_module) File "C:\Program Files\JetBrains\PyCharm 2019.1\helpers\pydev\pydevd.py", line 1135, in run pydev_imports.execfile(file, globals, locals) # execute the script File "C:\Program Files\JetBrains\PyCharm 2019.1\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "D:/git/Deep-Reinforcement-Learning-Hands-On/Chapter07/01_dqn_basic_tf.py", line 165, in train_loss = tf_agent.train(experience) File "D:\pyenv\tf2\lib\site-packages\tf_agents\agents\tf_agent.py", line 177, in train loss_info = self._train_fn(experience=experience, weights=weights) File "D:\pyenv\tf2\lib\site-packages\tf_agents\agents\dqn\dqn_agent.py", line 256, in _train weights=weights) File "D:\pyenv\tf2\lib\site-packages\tf_agents\agents\dqn\dqn_agent.py", line 353, in loss name='loss', data=loss, step=self.train_step_counter) File "D:\pyenv\tf2\lib\site-packages\tensorboard\plugins\scalar\summary_v2.py", line 65, in scalar metadata=summary_metadata) File "D:\pyenv\tf2\lib\site-packages\tensorflow\python\ops\summary_ops_v2.py", line 632, in write _should_record_summaries_v2(), record, _nothing, name="summary_cond") File "D:\pyenv\tf2\lib\site-packages\tensorflow\python\framework\smart_cond.py", line 54, in smart_cond return true_fn() File "D:\pyenv\tf2\lib\site-packages\tensorflow\python\ops\summary_ops_v2.py", line 627, in record name=scope) File "D:\pyenv\tf2\lib\site-packages\tensorflow\python\ops\gen_summary_ops.py", line 793, in write_summary writer, step, tensor, tag, summary_metadata, name=name, ctx=_ctx) File "D:\pyenv\tf2\lib\site-packages\tensorflow\python\ops\gen_summary_ops.py", line 824, in write_summary_eager_fallback step = _ops.convert_to_tensor(step, _dtypes.int64) File "D:\pyenv\tf2\lib\site-packages\tensorflow\python\framework\ops.py", line 1050, in convert_to_tensor return convert_to_tensor_v2(value, dtype, preferred_dtype, name) File "D:\pyenv\tf2\lib\site-packages\tensorflow\python\framework\ops.py", line 1108, in convert_to_tensor_v2 as_ref=False) File "D:\pyenv\tf2\lib\site-packages\tensorflow\python\framework\ops.py", line 1186, in internal_convert_to_tensor ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) File "D:\pyenv\tf2\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py", line 1420, in _dense_var_to_tensor return var._dense_var_to_tensor(dtype=dtype, name=name, as_ref=as_ref) # pylint: disable=protected-access File "D:\pyenv\tf2\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py", line 1371, in _dense_var_to_tensor "of type {!r}".format(dtype.name, self.dtype.name)) ValueError: Incompatible type conversion requested to type 'int64' for variable of type 'int32'

if you would change the initialization of this parameter to

train_step_counter = tf.Variable(0, dtype=tf.int64)

then you have no Problems

01_dqn_basic_tf_selfrunning.txt

egonina commented 5 years ago

global step needs to be of type tf.int64. Could you point to the colab example you're referring to so we can fix it if that's an issue there?

ideenfix commented 5 years ago

https://github.com/tensorflow/agents/blob/master/tf_agents/colabs/1_dqn_tutorial.ipynb

egonina commented 5 years ago

Hm, I'm unable to reproduce this by running the colab you linked, the train step runs fine and the type of train_step_counter variable is tf.int32. Are you running this colab directly or are you modifying anything in your code?

ideenfix commented 5 years ago

At first, i use the colab example as a reference for an own RL agent.

I'm checking the colab example on my notebook after installing the jupyter package. The colab example is running after switching off some import statements as these would not run with Win10 (e.g. pyvirtualdisplay or display = pyvirtualdisplay.Display(visible=0, size=(1400, 900)).start()).

Then I have checked my own script and got some new errors like

D:\pyenv\py36tf2\Scripts\python.exe D:/git/Deep-Reinforcement-Learning-Hands-On/Chapter07/01_dqn_basic_tf.py Python:3.6.6 (v3.6.6:4cf1f54eb7, Jun 27 2018, 03:37:03) [MSC v.1900 64 bit (AMD64)] Tensorflow: 1.14.1-dev20190603 TF-Agent:0.2.0 Traceback (most recent call last): File "D:/git/Deep-Reinforcement-Learning-Hands-On/Chapter07/01_dqn_basic_tf.py", line 62, in writer = tf.summary.create_file_writer(log_dir) File "D:\pyenv\py36tf2\lib\site-packages\tensorflow\python\util\deprecation_wrapper.py", line 104, in getattr attr = getattr(self._dw_wrapped_module, name) AttributeError: module 'tensorflow._api.v1.summary' has no attribute 'create_file_writer'

After checking my virtual environment I stated that after running the colab example tf-nightly was installed. Attached the pip list output excerpt tb-nightly 1.14.0a20190602 tensorflow-estimator-2.0-preview 1.14.0.dev2019060300 termcolor 1.1.0 terminado 0.8.2 testpath 0.4.2 tf-agents-nightly 0.2.0.dev20190528 tf-estimator-nightly 1.14.0.dev2019052901 tf-nightly 1.14.1.dev20190603 tf-nightly-gpu-2.0-preview 2.0.0.dev20190602 tfp-nightly 0.8.0.dev20190603

This has overwritten the tf-nightly-gpu-2.0-preview package preference and was the reason for the last error regarding to the attribute error as my script based on TF2.0.

Attached you can find the standalone version of my own script 01_dqn_basic_tf_standalone.txt

If you would run this script in a pure TF2 environment e.g. Python:3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05) [MSC v.1916 64 bit (AMD64)] Tensorflow: 2.0.0-alpha0

than this script runs until the first training step and train_loss = tf_agent.train(experience) throws above mentioned error which can only be corrected after changing the initialization to train_step_counter = tf.Variable(0, dtype=tf.int64)

PS: Sorry last week I was on vacation leave in the Mediterranean sea.

bartmaciszewski commented 4 years ago

I came across the same error when trying to write summaries to Tensorboard. The fix proposed by @ideenfix to change the step counter fixed the issue.

train_step_counter = tf.Variable(0, dtype=tf.int64)

Thanks!

tensorflow / agents

Error using parameter train_step_counter according to colab example #121