hn2 commented 5 years ago

Describe the bug I am trying to run stable_baseline alogs such as ppo1, ddpg and get this error: ValueError: could not broadcast input array from shape (2) into shape (7,1,5)

Code example

action will be the portfolio weights from 0 to 1 for each asset

    self.action_space = gym.spaces.Box(-1, 1, shape=(len(instruments) + 1,), dtype=np.float32)  # include cash

    # get the observation space from the data min and max
    self.observation_space = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(len(instruments), window_length, history.shape[-1]), dtype=np.float32)

I tried using obs.reshape(-1), obs.flatten(), obs.ravel() nothing works. Also tried CnnPolicy onstead of MlpPolicy and got:

ValueError: Negative dimension size caused by subtracting 8 from 7 for 'model/c1/Conv2D' (op: 'Conv2D') with input shapes: [?,7,1,5], [8,8,5,32].

System Info Describe the characteristic of your environment: *library was installed using: git clone https://github.com/hill-a/stable-baselines.git cd stable-baselines pip install -e .

GPU models and configuration: no gpu, cpu only
Python 3.7.2
tensorflow 1.12.0
stable-baselines 2.4.1

Additional context Add any other context about the problem here.

tensorflow==1.13.1 cpu

araffin commented 5 years ago

Hello again,

as written in the previous topic (issue #239 ):

A simple solution consists in reshaping your observation to a 1D vector, so you can use MlpPolicy on it. Otherwise, if you want to keep your observation with that exact shape, then you have to define a custom policy, as the default CnnPolicy was made for images (of shape 64x64xn) and normalization (dividing by 255.) is automatically applied in this case.

Again, do not forget to add more context for your problem next time by filling COMPLETELY the issue template ;)

EDIT: Please note the word in uppercase ;)

hn2 commented 5 years ago

Thanks.

I tried reshaping the observation (in both step and reset methods) as obs = obs.reshape(-1)

Is this the correct way?

Still getting

ValueError: could not broadcast input array from shape (2) into shape (7,1,5)

From: Antonin RAFFIN [mailto:notifications@github.com] Sent: Sunday, 24 March 2019 11:24 To: hill-a/stable-baselines Cc: hn2; Author Subject: Re: [hill-a/stable-baselines] ValueError: could not broadcast input array from shape (2) into shape (7,3,5) (#242)

Hello again,

as written in the previous topic:

A simple solution consists in reshaping your observation to a 1D vector, so you can use MlpPolicy on it. Otherwise, if you want to keep your observation with that exact shape, then you have to define a custom policy, as the default CnnPolicy was made for images (of shape 64x64xn) and normalization (dividing by 255.) is automatically applied in this case.

Again, do not forget to add more context for your problem next time by filling COMPLETELY the issue template ;)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/hill-a/stable-baselines/issues/242#issuecomment-475942079 , or mute the thread https://github.com/notifications/unsubscribe-auth/AHM1Sgj-6V7dXrBLOPj1jcKwUY6TfdRqks5vZ0Q2gaJpZM4cFXCI . https://github.com/notifications/beacon/AHM1SghaqHFHNthOh1b3MOm2qB1JJcY1ks5vZ0Q2gaJpZM4cFXCI.gif

This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus

hn2 commented 5 years ago

Still not able to solve this.

I tried to use the flatten() method to return flattened observations (return obs.flatten(), info)

I still get:

model.learn(total_timesteps=10000)

File "/home/ubuntu/anaconda2/lib/python3.6/site-packages/stable_baselines/ppo1/pposgd_simple.py", line 230, in learn

seg = seg_gen.__next__()

File "/home/ubuntu/anaconda2/lib/python3.6/site-packages/stable_baselines/trpo_mpi/utils.py", line 35, in traj_segment_generator

observation = env.reset()

File "/home/ubuntu/anaconda2/lib/python3.6/site-packages/stable_baselines/common/base_class.py", line 523, in reset

return self.venv.reset()[0]

File "/home/ubuntu/anaconda2/lib/python3.6/site-packages/stable_baselines/common/vec_env/dummy_vec_env.py", line 57, in reset

self._save_obs(env_idx, obs)

File "/home/ubuntu/anaconda2/lib/python3.6/site-packages/stable_baselines/common/vec_env/dummy_vec_env.py", line 75, in _save_obs

self.buf_obs[key][env_idx] = obs

ValueError: could not broadcast input array from shape (2) into shape (7,1,5)

From: Antonin RAFFIN [mailto:notifications@github.com] Sent: Sunday, 24 March 2019 11:24 To: hill-a/stable-baselines Cc: hn2; Author Subject: Re: [hill-a/stable-baselines] ValueError: could not broadcast input array from shape (2) into shape (7,3,5) (#242)

Hello again,

as written in the previous topic:

A simple solution consists in reshaping your observation to a 1D vector, so you can use MlpPolicy on it. Otherwise, if you want to keep your observation with that exact shape, then you have to define a custom policy, as the default CnnPolicy was made for images (of shape 64x64xn) and normalization (dividing by 255.) is automatically applied in this case.

Again, do not forget to add more context for your problem next time by filling COMPLETELY the issue template ;)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/hill-a/stable-baselines/issues/242#issuecomment-475942079 , or mute the thread https://github.com/notifications/unsubscribe-auth/AHM1Sgj-6V7dXrBLOPj1jcKwUY6TfdRqks5vZ0Q2gaJpZM4cFXCI . https://github.com/notifications/beacon/AHM1SghaqHFHNthOh1b3MOm2qB1JJcY1ks5vZ0Q2gaJpZM4cFXCI.gif

This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus

hn2 commented 5 years ago

I tried CnnPolicy instead of MlpPolicy and got this error: ValueError: Negative dimension size caused by subtracting 8 from 7 for 'model/c1/Conv2D' (op: 'Conv2D') with input shapes: [?,7,1,5], [8,8,5,32].

araffin commented 5 years ago

For the fourth and last time, please fill in the issue template COMPLETELY otherwise we cannot help you, and i will have to close this issue.

araffin commented 5 years ago

Hello, the problem comes from your observation space, you should define it properly so it is consistent with the observation of your environment. Currently, your observation space has a shape of dimension 3 but your are giving a 1D vector to the agent.

hn2 commented 5 years ago

It is consistent. It should be 3d: first dimension is the list of assets, second dimension are historical prices (defined by window_length = 1..50, third dimension is open, high, low, close, volume. Where do you see that I am feeding the agent with 1D vector?

araffin commented 5 years ago

It is consistent. It should be 3d: first dimension is the list of assets, second dimension are historical prices (defined by window_length = 1..50, third dimension is open, high, low, close, volume. Where do you see that I am feeding the agent with 1D vector?

sooner in the discussion...

A simple solution consists in reshaping your observation to a 1D vector, so you can use MlpPolicy on it.

hn2 commented 5 years ago

As said before, I tried that but it didn't work (with reshape, flatten, ravel). Maybe I am not doing it correctly. Please advise.

hill-a commented 5 years ago

File "/home/ubuntu/anaconda2/lib/python3.6/site-packages/stable_baselines/common/base_class.py", line 523, in reset
return self.venv.reset()[0] 
File "/home/ubuntu/anaconda2/lib/python3.6/site-packages/stable_baselines/common/vec_env/dummy_vec_env.py", line 57, in reset
self._save_obs(env_idx, obs) 
File "/home/ubuntu/anaconda2/lib/python3.6/site-packages/stable_baselines/common/vec_env/dummy_vec_env.py", line 75, in _save_obs
self.buf_obs[key][env_idx] = obs 
ValueError: could not broadcast input array from shape (2) into shape (7,1,5)

This is a Numpy error, cause by buf_obs being shape=(7, 1, 5), which makes sense since you told it shape=(len(instruments), window_length, history.shape[-1]). As @araffin said, you cannot have a n-dimensional vector, except for n==1 or (n==3 AND shape[0:2] >= 64 AND shape[2] == 3 or 1). Here you have a 3 dimensional vector with 7, 1 and 5 as it's width, heigh and depth, which will not work for CNN.

The solution as such is to drop to a one dimensional vector:

self.observation_space = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(len(instruments) * window_length * history.shape[-1],), dtype=np.float32)

NOTE the multiplication in shape=(len(instruments) * window_length * history.shape[-1],) and not commas, this gives you shape=(35,), which is now compatible with MlpPolicy, and now you can use obs.reshape(-1) in your step() AND reset() functions.

If you need more help, we are going to need you entire environment code, as I am incapable of deducing Numpy broadcasting errors without knowing how the numpy array was formed AND how it was used.

hn2 commented 5 years ago

Ok I changed it as suggested I now get: model.learn(total_timesteps=10000) File "/home/ubuntu/anaconda2/lib/python3.6/site-packages/stable_baselines/ppo1/pposgd_simple.py", line 230, in learn seg = seg_gen.next() File "/home/ubuntu/anaconda2/lib/python3.6/site-packages/stable_baselines/trpo_mpi/utils.py", line 35, in traj_segment_generator observation = env.reset() File "/home/ubuntu/anaconda2/lib/python3.6/site-packages/stable_baselines/common/base_class.py", line 523, in reset return self.venv.reset()[0] File "/home/ubuntu/anaconda2/lib/python3.6/site-packages/stable_baselines/common/vec_env/dummy_vec_env.py", line 57, in reset self._save_obs(env_idx, obs) File "/home/ubuntu/anaconda2/lib/python3.6/site-packages/stable_baselines/common/vec_env/dummy_vec_env.py", line 75, in _save_obs self.buf_obs[key][env_idx] = obs ValueError: cannot copy sequence with size 2 to array axis with dimension 35

hill-a commented 5 years ago

If you need more help, we are going to need you entire environment code, as I am incapable of deducing Numpy broadcasting errors without knowing how the numpy array was formed AND how it was used.

I have no idea what your environment is sending down to the DummyVecEnv, as such I'm not capable of helping you without the original code, for me you are simply sending the wrong information in reset() and I am clueless to why without further information.

Other than that, this does not seem like a stable baselines issues, rather an implementation issue, make sure you followed the guide for custom environment https://stable-baselines.readthedocs.io/en/master/guide/custom_env.html and that your reset() implementation is correctly returning the observation.

araffin commented 5 years ago

I agree with @hill-a , this is apparently not an error due to stable-baselines but to a custom environment. I think we gave you already enough information to help you debug your environment.

We do not do personal debugging and focus on SB issues, so I will close this one.

araffin commented 5 years ago

hn2 commented 5 years ago

It looks like it did flattened the array as it now has dimension 35 I just did exactly what you said and changed the observation space

araffin commented 5 years ago

You may have the same issue as here: https://github.com/hill-a/stable-baselines/issues/214

hn2 commented 5 years ago

Here is the code of my custom env reset:

def reset(self):
    self.infos = []
    self.old_portfolio_value = self.capital_base
    self.new_portfolio_value = self.capital_base
    self.current_step = 0

    # get data for this episode, each episode might be different.
    if self.start_date is None:
        self.idx = np.random.randint(low=self.window_length, high=self.history.shape[1] - self.steps)
    else:
        # compute index corresponding to start_date for repeatable sequence
        self.idx = date_to_index(self.start_date) - self.start_idx
        assert self.idx >= self.window_length and self.idx <= self.history.shape[1] - self.steps, \
            'Invalid start date, must be window_length day after start date and simulation steps day before end date'
    # print('Start date: {}'.format(index_to_date(self.idx)))
    data = self.history[:, self.idx - self.window_length:self.idx + self.steps + 1, :4]
    # apply augmentation?
    obs = data[:, self.current_step:self.current_step+ self.window_length, :].copy()
    ground_truth_obs = data[:, self.current_step+ self.window_length:self.current_step+ self.window_length + 1, :].copy()

    cash_obs = np.ones((1, self.window_length, obs.shape[2]))
    obs = np.concatenate((cash_obs, obs), axis=0)
    cash_ground_truth = np.ones((1, 1, ground_truth_obs.shape[2]))
    ground_truth_obs = np.concatenate((cash_ground_truth, ground_truth_obs), axis=0)
    info = {}
    info['next_obs'] = ground_truth_obs
    return obs.reshape(-1), info

araffin commented 5 years ago

you are returning a tuple in the reset method. It should be only an observation, @hill-a was right.

hn2 commented 5 years ago

I finally succeeded in running the model. Do you have documentation how to interpret the output?

(verbosity=1)
********** Iteration 0 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00148 |      -0.11349 |       0.03412 |       0.01007 |      11.34949
     -0.01255 |      -0.11346 |       0.02966 |       0.02644 |      11.34579
     -0.02320 |      -0.11343 |       0.02311 |       0.02290 |      11.34317
     -0.02648 |      -0.11338 |       0.01937 |       0.01414 |      11.33826
Evaluating losses...
     -0.03064 |      -0.11335 |       0.01735 |       0.01275 |      11.33532
----------------------------------
| EpThisIter      | 0            |
| EpisodesSoFar   | 0            |
| TimeElapsed     | 15.9         |
| TimestepsSoFar  | 256          |
| ev_tdlam_before | -3.03        |
| loss_ent        | 11.335321    |
| loss_kl         | 0.01274954   |
| loss_pol_entpen | -0.113353215 |
| loss_pol_surr   | -0.030639375 |
| loss_vf_loss    | 0.017349362  |
----------------------------------
********** Iteration 1 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00121 |      -0.11333 |       0.02765 |       0.00017 |      11.33332
     -0.00925 |      -0.11329 |       0.02208 |       0.00160 |      11.32853
     -0.01676 |      -0.11323 |       0.02083 |       0.00704 |      11.32298
     -0.02069 |      -0.11318 |       0.01745 |       0.01167 |      11.31769
Evaluating losses...
     -0.02851 |      -0.11316 |       0.01342 |       0.01097 |      11.31555
----------------------------------
| EpThisIter      | 0            |
| EpisodesSoFar   | 0            |
| TimeElapsed     | 20.9         |
| TimestepsSoFar  | 512          |
| ev_tdlam_before | -17.4        |
| loss_ent        | 11.315554    |
| loss_kl         | 0.010965543  |
| loss_pol_entpen | -0.11315554  |
| loss_pol_surr   | -0.028507909 |
| loss_vf_loss    | 0.013420929  |
----------------------------------

hill-a commented 5 years ago

There is no coordination between the logging for the methods unfortunatly. I'm guessing from the losses in the log that you are using PPO1 here, so here are the descriptions:

EpThisIter: number of episodes that occured during this iteration
EpisodesSoFar: number of episodes that occured so far
TimeElapsed: the elapsed time in seconds
TimestepsSoFar: the number of timesteps so far
ev_tdlam_before: explained variance between predicted value function and TD(lambda) estimator
loss_ent: entropy loss
loss_kl: Kullback-Leibler loss
loss_pol_entpen: entropy loss times -entropy coef
loss_pol_surr: pessimistic surrogate loss
loss_vf_loss: value function loss

EDIT: if you wish to measure the performance of the method, please have a glance at tensorboard and the example code given in the doc for validating the method after learning.

hn2 commented 5 years ago

I am using the tensorboard integration. Is it possible to print additional info returning from the step on the tensorboard web page?

hn2 commented 5 years ago

When I changed to:

self.action_space = gym.spaces.Box(-1., 1., shape=(len(self.src.asset_names) + 1,), dtype=np.float32)
self.observation_space = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(len(self.src.asset_names) + 1, window_length, history.shape[-1],), dtype=np.float32)

and I do:

def step(self, action):

    print(action)

I always get: [nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan ..

It is a problem with my env or with the agent?

hn2 commented 5 years ago

Another error that I get on windows when trying to import stable_baseline: Exception has occurred: AssertionError expected file or str, got <_pydevd_bundle.pydevd_io.IORedirector object at 0x000001EE6A3464E0>

in logger.py line 56: assert hasattr(filename_or_file, 'read'), 'expected file or str, got %s' % filename_or_file The same code works on ubuntu. I guess that something is wring with my windows python environment but what?

hill-a commented 5 years ago

I am using the tensorboard integration. Is it possible to print additional info returning from the step on the tensorboard web page?

No, that would be quite difficult to do. It would be better to use the validation part of the example code given in the doc:

obs = env.reset()
for i in range(1000):
    action, _states = model.predict(obs)
    obs, rewards, dones, info = env.step(action)
    print(info)
    env.render()

When I changed to:
self.action_space = gym.spaces.Box(-1., 1., shape=(len(self.src.asset_names) + 1,), dtype=np.float32)
self.observation_space = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(len(self.src.asset_names) + 1, window_length, history.shape[-1],), dtype=np.float32)
and I do:
def step(self, action):
   print(action)
I always get: [nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan ..

It is a problem with my env or with the agent?

As discussed earlier, this will not work, observation space needs to be one dimensional:

self.observation_space = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(len(instruments) * window_length * history.shape[-1],), dtype=np.float32)

Another error that I get on windows when trying to import stable_baseline:
Exception has occurred: AssertionError
    expected file or str, got <_pydevd_bundle.pydevd_io.IORedirector object at 0x000001EE6A3464E0>
in logger.py line 56:
    assert hasattr(filename_or_file, 'read'), 'expected file or str, got %s' % filename_or_file
The same code works on ubuntu. I guess that something is wring with my windows python environment but what?

Next time, tell me what you are using, it helps avoid guessing. So, I'm assuming you are using Pycharm, since it is a _pydevd_bundle object. This error is caused by the logger taking stdout and asserting it is a string or a readable IO object. Here with pycharm under windows it seems to be the case. I quick fix for this would be to comment that assertation line 56 in logger.py

hn2 commented 5 years ago

I am using VS code.

hill-a commented 5 years ago

I am using VS code.

Ok, this is what stable-baselines is crashing on, it seems that for some reason the object does not have a read attribute, not sure why though as this code should work and dates from the initial commit of VS... Again, comment that assertation line 56 in logger.py. it is only there as a safeguard for those who are playing with the logger, which I am assuming you are not.

hn2 commented 5 years ago

What is the solution then? move to another ide?

hill-a commented 5 years ago

For the last time

Again, comment that assertation line 56 in logger.py

In stable-baselines folder there is the file logger.py, comment line 56. Will do a hotfix later. Locking thread as we do not do tech support.

hill-a / stable-baselines

ValueError: could not broadcast input array from shape (2) into shape (7,3,5) #242

action will be the portfolio weights from 0 to 1 for each asset