Run Time Warning in evolution_strategy_bayesian_agent

thanuj11 commented 5 years ago

When I run the code against intraday data, where data is a 1-minute interval and when fit() executes 75% of the time I am seeing the same error below and rewards are showing as 0.00

RuntimeWarning: invalid value encountered in true_divide rewards = (rewards - np.mean(rewards)) / np.std(rewards) iter 100. reward: 0.000000 iter 200. reward: 0.000000 iter 300. reward: 0.000000 when used bayesian parameters 90% of the time same error but when used custom hard coded parameters model = Model(input_size = 32, layer_size = 500, output_size = 3) , agent =Agent(population_size = 15, sigma = 0.1, learning_rate = 0.03, model = model, money = 10000, max_buy = 100, max_sell = 100, skip = 1, window_size = 32) 75% of time same error is showing up

can you help me out to fix this issue and any suggestions?

AlconDivino commented 5 years ago

Well the problem is with no.std(rewards)

When this errors happens, the function returns 0 or nan . And because of that there is no reward and that breaks the training.

thanuj11 commented 5 years ago

Well the problem is with no.std(rewards)

When this errors happens, the function returns 0 or nan . And because of that there is no reward and that breaks the training.

When I try to run the code against the same dataset for 20 times, only 1-2 times I can see some results in rewards without any errors iter 100. reward: 3.656967 iter 200. reward: 4.959833 iter 300. reward: 5.214833 Rest of the cases it is all zeros as I mentioned earlier. I just want to understand why the rewards list is always 0 in most of the cases when I run the code since the list is all 0 np.std(rewards) is 0 which is causing the issue, is there any modification can we add in get_reward function or _get_weight_from_population functions to fix this issue. Any other suggestions are highly appreciable.

AlconDivino commented 5 years ago

I can't come up with a fix rn. For why it happens is just the reward function returns rewards with no values they are given to the population and then it repeats.

huseinzol05 commented 5 years ago

Let me know which notebook, I want to add alpha to prevent NaN during division,

rewards = (rewards - np.mean(rewards)) / (np.std(rewards) + 1e-7)

thanuj11 commented 5 years ago

Let me know which notebook, I want to add alpha to prevent NaN during division,
rewards = (rewards - np.mean(rewards)) / (np.std(rewards) + 1e-7)

The error is in Stock-Prediction-Models/free-agent/evolution-strategy-bayesian-agent.ipynb notebook, actually, the rewards list is always zero, rewards = (rewards - np.mean(rewards)) / (np.std(rewards)) Since rewards list is zero (rewards - np.mean(rewards)) =0 and np.std(rewards) =0 all the time, which is causing the error if we can modify get_reward function or _get_weight_from_population functions which are returning the rewards list that will solve the issue I guess

thanuj11 commented 5 years ago

Let me know which notebook, I want to add alpha to prevent NaN during division,
rewards = (rewards - np.mean(rewards)) / (np.std(rewards) + 1e-7)
The error is in Stock-Prediction-Models/free-agent/evolution-strategy-bayesian-agent.ipynb notebook, actually, the rewards list is always zero, rewards = (rewards - np.mean(rewards)) / (np.std(rewards)) Since rewards list is zero (rewards - np.mean(rewards)) =0 and np.std(rewards) =0 all the time, which is causing the error if we can modify get_reward function or _get_weight_from_population functions which are returning the rewards list that will solve the issue I guess

@huseinzol05 @AlconDivino I want to ask you, do you have any comment or suggestion for fixing the above issue. I also noticed every time I run evolution_strategy_bayesian_agent.ipynb the rewards are always zero most of the times, when I keep on running again and again on the same dataset then the rewards will change, can I know why is that happening and why I am seeing rewards as 0's 90% of the time. I tried both rewards = (rewards - np.mean(rewards)) / (np.std(rewards) + 1e-7), rewards = (rewards - np.mean(rewards)) / (np.std(rewards)) iter 100. reward: 0.000000 iter 200. reward: 0.000000 iter 300. reward: 0.000000 But It is always giving me the zero's. I want to ask you if modifying get_reward function or _get_weight_from_population functions which are returning the rewards list as zeros can solve the issue or do you any suggestions to handle this issue.

huseinzol05 commented 5 years ago

Sometime your trend values are very slow, and the initial money is 10k, so the reward always almost positive. To solve this,

augment your trend values, multiply by 100.
reduce initial money, maybe max(trends).

thanuj11 commented 5 years ago

Sometime your trend values are very slow, and the initial money is 10k, so the reward always almost positive. To solve this,

augment your trend values, multiply by 100.

reduce initial money, maybe max(trends).

Thanks, @huseinzol05, multiplying by 100 worked for a couple of stocks and the rest of them are still showing zeros I will try to augment more, but on the whole, I understand that this will work only with data which has a trend.

@huseinzol05 I also want to ask you that I noticed the agent/model is using only "closed" column in stock data to make predictions. I believe that, To make it more efficient can we use more data columns(ex: "volume" etc) or adding more technical indicators(moving avg, etc) or not? Can you please let me know if that is possible or not and if you have any example for feeding more data inputs and adding indicators that will be a great help for everybody.

huseinzol05 commented 5 years ago

Of course it is possible, why not you pull a request for that? :)

thanuj11 commented 5 years ago

Of course it is possible, why not you pull a request for that? :)

@huseinzol05 I need your help to understand the evolution_strategy_bayesian_agent strategy, from the code I see you are training the model/agent (agent.fit()) and again doing the buy (agent.buy()) on the same data set, 1) why are you training and again buying on the same data set, I believe doing this seems to be over-fitting?. How can we overcome this issue? 2) can we use this model/agent trained on previous data and use for real-time predictions, will that show the same performance? Correct me if I understood your code wrongly. Any comments or suggestions will be more helpful

marvin-hansen commented 5 years ago

@thanuj11 Yes, that's overfitting. However, you can split your data in train/test and modify the code to train on the train data and then trade on the test data. To do so, some refactoring is needed especially w.r.t. to encapsulate agent configuration. The code in the notebook allows easy extraction of classes, so once you've done that, you have to tweak init & parameters. There are still some issues to fix along the way, but overall you get a good starting point. For production, you need to do a lot more than that. For instance, you need to do better data pre-processing, benchmark the agent on F1 score, and do some paper trading just to name a few. For more details, take a look at the following book:

Advances in Financial Machine Learning 1st Edition, by Marcos Lopez de Prado https://www.amazon.com/Advances-Financial-Machine-Learning-Marcos/dp/1119482089

huseinzol05 commented 5 years ago

There is a reason we did an overfitting agent, overfitting is really easy to train and deploy. To find a good agent that able to trade on new environment is pretty tough to find or you will never find at all. reinforcement learning agent tends to over-estimate everything because it only learns on same environment or space. If you give a new space (new trends / future trends), high chance it will not able to perform. You cannot use a single agent to do trading on every single trends / pairs you have, you might need knowledge on software engineer to help you, do batching training, batching prediction, you can use Apache Airflow to help you. My company is use Airflow, steps are easy.

predict future impact based on sentiment data (another deep learning model)
predict future trend based on that impact (another deep learning model)
feed that future trend to the trading agent

Again, my solution is a theoretical testing, it is possible to serve the model real-time, but that is a secret between my colleagues how we penaltized the agent for not overfitting.

marvin-hansen commented 5 years ago

I have a question,

is any of these RL agents actually used in paper-trading or production?

I was running an experiment on an agent from your repo on two different versions. V-1 was the original code and it performed about the same as reported. However, V-2 was doing training and testing on two different sets from exactly the same stock and while train yielded a notch less reward than v1, the agent completely bombed the test on a separate test dataset with -85.67 %, which is a far cry from the overfitted results presented in the readme.

Correct me if I am wrong but the only secret I can think of to make that kind of agent work comes down to nail the trend and sentiment prediction so that the agent can make a buy/sell decision with information ahead of time.

huseinzol05 commented 5 years ago

my company use these agents in production.

Screenshot 2019-04-18 at 4 05 09 PM

huseinzol05 / Stock-Prediction-Models

Run Time Warning in evolution_strategy_bayesian_agent #23