tensorflow / agents

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.
Apache License 2.0
2.78k stars 717 forks source link

Custom OpenAI Gym environment wrapped as a PyEnvironment time_step not matching expected time_step_spec #582

Open mjssimon opened 3 years ago

mjssimon commented 3 years ago

Greetings. This problem is similar to the one here at: https://stackoverflow.com/questions/57259497/py-environment-time-step-doesnt-match-time-step-spec . However, with the minimal documentation on how the PyEnvironment wrapper works, I'm having some trouble resolving. The custom OpenAI Gym environment works correctly.

I make the gym environment, wrap it, then validate. The error occurs during validation.

gym_env = gym.make('gym_leopointing:leopointing-v0', re=re, h_sc=h_sc, h_plat=h_plat, min_elev_ang= min_elev_ang, max_ang_vel= max_ang_vel, num_theta= num_theta, num_sc= num_sc, ctrl_loop_freq=ctrl_loop_freq, seed=seed, tol=tol, convg_sec=convg_sec, test=test)

py_env = suite_gym.wrap_env(gym_env)

print('Action Spec:') print(py_env.action_spec())

print('Observation Spec:') print(py_env.observation_spec())

from tf_agents.environments import utils utils.validate_py_environment(py_env, episodes=5)

The specific error is:

Traceback (most recent call last): File ".\solve_leopointing.py", line 92, in utils.validate_py_environment(py_env, episodes=5) File "C:\Users\michael.simon.conda\envs\tf\lib\site-packages\tf_agents\environments\utils.py", line 65, in validate_py_environment raise ValueError( ValueError: Given time_step: TimeStep(step_type=array(0), reward=array(0., dtype=float32), discount=array(1., dtype=float32), observation=array([-2.1909594e+38, 9.8691240e-02, -1.9750980e+01], dtype=float32)) does not match expected time_step_spec: TimeStep(step_type=ArraySpec(shape=(), dtype=dtype('int32'), name='step_type'), reward=ArraySpec(shape=(), dtype=dtype('float32'), name='reward'), discount=BoundedArraySpec(shape=(), dtype=dtype('float32'), name='discount', minimum=0.0, maximum=1.0), observation=BoundedArraySpec(shape=(3,), dtype=dtype('float32'), name='observation', minimum=[-3.4028235e+38 4.3633232e-01 -3.4028235e+38], maximum=[3.4028235e+38 2.7052603e+00 3.4028235e+38]))

It seems like I need to setup a custom time_step_spec, but I can't find any documentation on how to do this.

I can provide the custom OpenAI Gym environment if necessary.

Any help is appreciated.

Thank you.

ebrevdo commented 3 years ago

@sguada ping on this one.