Closed soulaw-mkii closed 5 years ago
The reward function / system should not return nan
and we should not edit the code inside Deep_Evolution_Strategy
. You might want to check to your data why it returned nan
.
Hi Husein,
Understood your concern, the problem in fact is occasional. When we have a constant reward[k] e.g. [ 40, 40 ] across population, there is no sigma at all and fails the standardisation in the Deep_Evolution_Strategy.train
If used your full version, set the Agent.POPULATION_SIZE
from 15 to 2 even to 1, the issue can be more easily revealed as population is small; of course this suggestion is only for illustration. You can try out.
Here is my log with Agent.POPULATION_SIZE = 2
:
`iter 10. reward: -5.367704 iter 20. reward: 2.963402 : : iter 350. reward: -6.884399 /Users/steveytc/anaconda3/envs/pyfinance/lib/python3.6/site-packages/ipykernel_launcher.py:39: RuntimeWarning: invalid value encountered in true_divide
ValueError Traceback (most recent call last)
Hi Husein, I think can close this issue as I found you added a tiny number i.e. np.std(rewards) + 1e-7
in another notebook.
Hi Husein,
In the class Deep_Evolution_Strategy, this piece standardises the rewards
rewards = (rewards - np.mean(rewards)) / np.std(rewards)
If the rewards contains the same value in a row, then np.std(rewards) will be zero caused the whole array NaN.
My debug log:
before rewards <class 'numpy.ndarray'> (15,) : [37.1118028 37.1118028 37.1118028 37.1118028 37.1118028 37.1118028 37.1118028 37.1118028 37.1118028 37.1118028 37.1118028 37.1118028 37.1118028 37.1118028 37.1118028]
normalized rewards <class 'numpy.ndarray'> (15,) : [nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan]
Probably we might need to amend:
rewards = (rewards - np.mean(rewards)) / np.std(rewards) if np.std(rewards) > 0 else rewards
Regards, Steve