hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.14k stars 723 forks source link

1D Vector of floats as an observation space #1164

Open WilliamFlinchbaugh opened 2 years ago

WilliamFlinchbaugh commented 2 years ago

Hey there, I've been working on this environment for a bit but just can't seem to grasp the observation space. Essentially I have a list of attributes (13 floats) that I need held in the observation space. The max they could be is 1200 (x and y coords).

Do I need to have a vector defining the low and high for each value? Some values can only go up to 7

This is my observation space: self.observation_space = spaces.Box(low=0, high=1200, shape=(13, ), dtype=np.float32) In my reset(), I return a numpy vector of 13 floats, however when I run check_env, I get the following: AssertionError: The observation returned by the reset() method does not match the given observation space

Several people online mentioned using a dict instead, but I tried to do that and it didn't work. I also understand that I'm supposed to be using values between 0 and 1? I'm a bit confused about that. I'm just really unfamiliar with gym in general and I'm not quite sure what I'm doing, so any help would be appreciated

Miffyli commented 2 years ago

Yes, you should define the low and high with vectors of 13 elements, each representing the low/high of the respective index in observations. That should do the trick.

Note that you should consider normalizing your observations to be in interval [-1, 1] (or so) for all dimensions. See RL tips.

WilliamFlinchbaugh commented 2 years ago

That didn't quite seem to work. I changed the observation space to the following: self.observation_space = spaces.Box(low=np.array([-1] * 13), high=np.array([1] * 13), shape=(13, ), dtype=np.float32) and then I scaled down all of my observations to be within -1 and 1.

I'm still getting the same error. This is what the array looks like: `[0.70270414 0.82211474 0.00291559 0.00989345 0.84439723 0.06853746 0.56000127 0.6862003 0.69316893 0.00258898 0.00294238 0.

  1. ]`
Miffyli commented 2 years ago

Hard to say with more code, but double-check what comes out from your reset function. Generally, the env checker errors are self-explanatory.