notadamking / RLTrader

A cryptocurrency trading environment using deep reinforcement learning and OpenAI's gym
https://discord.gg/ZZ7BGWh
GNU General Public License v3.0
1.71k stars 537 forks source link

New Data Stream with Machine learning and RL process #13

Open PromediaB opened 5 years ago

PromediaB commented 5 years ago

I think that we should add Data stream to receive continuously new updated data, Train the model on the new data, check if there is model performance improves if yes, then we should update the model.

i found interesting post about it https://medium.com/analytics-vidhya/data-streams-and-online-machine-learning-in-python-a382e9e8d06a

notadamking commented 5 years ago

This is a great idea, and is actually what I planned to do for the next article. Since we will be using these algorithms to trade on Coinbase, I will be streaming the data from Coinbase and incrementally training on data as it passes (as well as "making" trades).

However, we still need a starting point before we start trading/training on live data, and I believe the current method of optimize -> train -> test will be our best bet for getting to that starting point.

PromediaB commented 5 years ago

Thank you for your answer.

Want to ask you for now, im thinking to add Cron that will update the data from https://www.cryptodatadownload.com/cdd/Coinbase_BTCUSD_1h.csv every hour automatically.

There is anything on the code that will take only the new data to avoir learning from previous data or we should add it on dev ?

If yes, what is the best space where to add, train.py or optimize.py ?

notadamking commented 5 years ago

This is currently not set up in the code, though the best space to add it would be in train.py.

JohnAllen commented 5 years ago

Could this be an over-optimization? Adding an extra hour of data to a set that is 4-5 years old is a tiny improvement. And you're going to re-train every hour? If I were doing this myself I would probably retrain at most every few days, probably weekly actually.

Counter-argument is perhaps the most recent data is much more valuable than data from 5 years ago.

notadamking commented 5 years ago

@JohnAllen is correct. Re-training on each new data point would be over-zealous, and isn't likely to have much more benefit than re-training each day/few days. A live algorithm should therefore store data as it passes, and only re-train every n time steps. Training should also be done in a background thread to allow the agent to continue trading while training.

PromediaB commented 5 years ago

Perfect, then we will add something on the code to play the training every 2 days on the background, and let the agent trading on Bitmex by the same time.

amebamcare commented 5 years ago

Perfect, then we will add something on the code to play the training every 2 days on the background, and let the agent trading on Bitmex by the same time.

Did you try this in bitmex.? Any success.?

amebamcare commented 5 years ago

What about implementing ARIMA model forecasting on current project.? Is it possible or else it may be better than the idea of deploying new live Data stream with ML over current project? thoughts?

ghost commented 5 years ago

I would only extract the box transform from the arima project. The rest ist mit promising. Do they show the performance with the approach? I just found correlation and price curves in the description.