TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.
2.78k
stars
717
forks
source link
Custom OpenAI Gym environment wrapped as a PyEnvironment time_step not matching expected time_step_spec #582
Open
mjssimon opened 3 years ago
Greetings. This problem is similar to the one here at: https://stackoverflow.com/questions/57259497/py-environment-time-step-doesnt-match-time-step-spec . However, with the minimal documentation on how the PyEnvironment wrapper works, I'm having some trouble resolving. The custom OpenAI Gym environment works correctly.
I make the gym environment, wrap it, then validate. The error occurs during validation.
gym_env = gym.make('gym_leopointing:leopointing-v0', re=re, h_sc=h_sc, h_plat=h_plat, min_elev_ang= min_elev_ang, max_ang_vel= max_ang_vel, num_theta= num_theta, num_sc= num_sc, ctrl_loop_freq=ctrl_loop_freq, seed=seed, tol=tol, convg_sec=convg_sec, test=test)
py_env = suite_gym.wrap_env(gym_env)
print('Action Spec:') print(py_env.action_spec())
print('Observation Spec:') print(py_env.observation_spec())
from tf_agents.environments import utils utils.validate_py_environment(py_env, episodes=5)
The specific error is:
Traceback (most recent call last): File ".\solve_leopointing.py", line 92, in
utils.validate_py_environment(py_env, episodes=5)
File "C:\Users\michael.simon.conda\envs\tf\lib\site-packages\tf_agents\environments\utils.py", line 65, in validate_py_environment
raise ValueError(
ValueError: Given
time_step
: TimeStep(step_type=array(0), reward=array(0., dtype=float32), discount=array(1., dtype=float32), observation=array([-2.1909594e+38, 9.8691240e-02, -1.9750980e+01], dtype=float32)) does not match expectedtime_step_spec
: TimeStep(step_type=ArraySpec(shape=(), dtype=dtype('int32'), name='step_type'), reward=ArraySpec(shape=(), dtype=dtype('float32'), name='reward'), discount=BoundedArraySpec(shape=(), dtype=dtype('float32'), name='discount', minimum=0.0, maximum=1.0), observation=BoundedArraySpec(shape=(3,), dtype=dtype('float32'), name='observation', minimum=[-3.4028235e+38 4.3633232e-01 -3.4028235e+38], maximum=[3.4028235e+38 2.7052603e+00 3.4028235e+38]))It seems like I need to setup a custom time_step_spec, but I can't find any documentation on how to do this.
I can provide the custom OpenAI Gym environment if necessary.
Any help is appreciated.
Thank you.