Potential problem with `max_episode_timesteps`

mrminy commented 3 years ago

Hi, developers.

I've got a custom environment set up with Tensorforce. After N episodes of training i get this error posted at the bottom of the issue. From what I've tried, I think it must be related to the max_episode_timesteps configuration.

I've tried different setups for both the agent configuration and the environment configuration. Both leading to the same error.

Tensorforce = 0.6.2 Python = 3.7 Tensorflow = 2.3.1

Code for creating agent:

agent = Agent.create(
        agent='ppo', environment=environment,
        network="auto",
        batch_size=64, learning_rate=3e-4,
        predict_terminal_values=True,
        baseline="auto",
        max_episode_timesteps=999,
        baseline_optimizer=dict(type='adam', learning_rate=3e-4),
        parallel_interactions=1,
        summarizer=dict(directory='./data', filename=datetime.today().strftime("summary_ppo-%Y%m%d-%H%M%S"))
    )

Sample code from my environment wrapper:

class MyEnv(gym.Env):
    ....
    def max_episode_timesteps(self):
        return 999

Stack trace:

Traceback (most recent call last):
  File "C:/Users/mikke/PycharmProjects/PlayingWithBombs/src/ai/simple_tensorforce.py", line 109, in <module>
    agent = train(agent, environment, num_episodes=20000)
  File "C:/Users/mikke/PycharmProjects/PlayingWithBombs/src/ai/simple_tensorforce.py", line 64, in train
    actions = agent.act(states=states)
  File "C:\Users\mikke\Anaconda3\envs\PlayingWithBombs\lib\site-packages\tensorforce\agents\agent.py", line 388, in act
    deterministic=deterministic
  File "C:\Users\mikke\Anaconda3\envs\PlayingWithBombs\lib\site-packages\tensorforce\agents\recorder.py", line 267, in act
    num_parallel=num_parallel
  File "C:\Users\mikke\Anaconda3\envs\PlayingWithBombs\lib\site-packages\tensorforce\agents\agent.py", line 425, in fn_act
    states=states, auxiliaries=auxiliaries, parallel=parallel
  File "C:\Users\mikke\Anaconda3\envs\PlayingWithBombs\lib\site-packages\tensorforce\core\module.py", line 128, in decorated
    output_args = function_graphs[str(graph_params)](*graph_args)
  File "C:\Users\mikke\Anaconda3\envs\PlayingWithBombs\lib\site-packages\tensorflow\python\eager\def_function.py", line 780, in __call__
    result = self._call(*args, **kwds)
  File "C:\Users\mikke\Anaconda3\envs\PlayingWithBombs\lib\site-packages\tensorflow\python\eager\def_function.py", line 814, in _call
    results = self._stateful_fn(*args, **kwds)
  File "C:\Users\mikke\Anaconda3\envs\PlayingWithBombs\lib\site-packages\tensorflow\python\eager\function.py", line 2829, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "C:\Users\mikke\Anaconda3\envs\PlayingWithBombs\lib\site-packages\tensorflow\python\eager\function.py", line 1848, in _filtered_call
    cancellation_manager=cancellation_manager)
  File "C:\Users\mikke\Anaconda3\envs\PlayingWithBombs\lib\site-packages\tensorflow\python\eager\function.py", line 1924, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "C:\Users\mikke\Anaconda3\envs\PlayingWithBombs\lib\site-packages\tensorflow\python\eager\function.py", line 550, in call
    ctx=ctx)
  File "C:\Users\mikke\Anaconda3\envs\PlayingWithBombs\lib\site-packages\tensorflow\python\eager\execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError:  indices[0] = [0, 999] does not index into shape [1,999,1,126]
     [[{{node agent/StatefulPartitionedCall/agent/TensorScatterUpdate}}]] [Op:__inference_act_1411]

Function call stack:
act

Any clues?

mrminy commented 3 years ago

Apparently not getting the same error when using the Runner.run() interface. Must be something I'm missing in my own training script.

Fine by me to close this issue if you don't want to investigate further..

AlexKuhnle commented 3 years ago

Hi @mrminy, I think the reason may be that your environment does not actually terminate after 999 steps -- note that implementing max_episode_timesteps() does not take care of the environment obeying to this limit. The general idea is: if your environment has a "natural" max number of timesteps, then implement Environment.max_episode_timesteps() (since then the environment really terminates by that point); if this "natural" limit does not exist, don't implement the environment function and instead specify the limit via Environment.create(environment=..., max_episode_timesteps=999). In that case, Tensorforce wraps the environment and takes care of termination accordingly. Let me know if that helps.

tensorforce / tensorforce

Potential problem with `max_episode_timesteps` #744