NaN in evolution-strategy agent

soulaw-mkii commented 5 years ago

Hi Husein,

In the class Deep_Evolution_Strategy, this piece standardises the rewards rewards = (rewards - np.mean(rewards)) / np.std(rewards)

If the rewards contains the same value in a row, then np.std(rewards) will be zero caused the whole array NaN.

My debug log: before rewards <class 'numpy.ndarray'> (15,) : [37.1118028 37.1118028 37.1118028 37.1118028 37.1118028 37.1118028 37.1118028 37.1118028 37.1118028 37.1118028 37.1118028 37.1118028 37.1118028 37.1118028 37.1118028] normalized rewards <class 'numpy.ndarray'> (15,) : [nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan]

Probably we might need to amend: rewards = (rewards - np.mean(rewards)) / np.std(rewards) if np.std(rewards) > 0 else rewards

Regards, Steve

huseinzol05 commented 5 years ago

The reward function / system should not return nan and we should not edit the code inside Deep_Evolution_Strategy. You might want to check to your data why it returned nan.

soulaw-mkii commented 5 years ago

Hi Husein,

Understood your concern, the problem in fact is occasional. When we have a constant reward[k] e.g. [ 40, 40 ] across population, there is no sigma at all and fails the standardisation in the Deep_Evolution_Strategy.train

If used your full version, set the Agent.POPULATION_SIZE from 15 to 2 even to 1, the issue can be more easily revealed as population is small; of course this suggestion is only for illustration. You can try out.

Here is my log with Agent.POPULATION_SIZE = 2:

`iter 10. reward: -5.367704 iter 20. reward: 2.963402 : : iter 350. reward: -6.884399 /Users/steveytc/anaconda3/envs/pyfinance/lib/python3.6/site-packages/ipykernel_launcher.py:39: RuntimeWarning: invalid value encountered in true_divide

ValueError Traceback (most recent call last)

in 1 model = Model(window_size, 500, 3) 2 agent = Agent(model, 10000, 5, 5) ----> 3 agent.fit(500, 10) in fit(self, iterations, checkpoint) 56 57 def fit(self, iterations, checkpoint): ---> 58 self.es.train(iterations, print_every = checkpoint) 59 60 def buy(self): in train(self, epoch, print_every) 36 self.weights, population[k] 37 ) ---> 38 rewards[k] = self.reward_function(weights_population) 39 rewards = (rewards - np.mean(rewards)) / np.std(rewards) 40 for index, w in enumerate(self.weights): in get_reward(self, weights) 30 quantity = 0 31 for t in range(0, l, skip): ---> 32 action, buy = self.act(state) 33 next_state = get_state(close, t + 1, window_size + 1) 34 if action == 1 and initial_money >= close[t]: in act(self, sequence) 20 def act(self, sequence): 21 decision, buy = self.model.predict(np.array(sequence)) ---> 22 return np.argmax(decision[0]), int(buy[0]) 23 24 def get_reward(self, weights): ValueError: cannot convert float NaN to integer` Regards, Steve

soulaw-mkii commented 5 years ago

Hi Husein, I think can close this issue as I found you added a tiny number i.e. np.std(rewards) + 1e-7 in another notebook.

huseinzol05 / Stock-Prediction-Models

NaN in evolution-strategy agent #43