werner-duvaud / muzero-general

MuZero
https://github.com/werner-duvaud/muzero-general/wiki/MuZero-Documentation
MIT License
2.5k stars 611 forks source link

How to adapt Muzero to financial trading? #158

Closed Ray-0403 closed 3 years ago

Ray-0403 commented 3 years ago

Hi, i want to know is it possible to directly train a trading agent to trade futures/forex market by this Muzero algo? Or we should modify the Muzero algo to make it suitable for dealing with financial trading problem? Any suggestion?

goshawk22 commented 3 years ago

I think this could be really interesting, as financial training is essentially a game. I think the biggest problem is how to represent all the data. There would have to be a lot of input data: past stocks, current assets etc. Then the agent would just earn a bigger reward the more money they earn at the end. You would also have to come up with a way of training the agent, as the stock market isn't really going to change depending on what the agent does, unlike games like Connect-4, so you would have to put it into a scenario based off past data.

Ray-0403 commented 3 years ago

I think this could be really interesting, as financial training is essentially a game. I think the biggest problem is how to represent all the data. There would have to be a lot of input data: past stocks, current assets etc. Then the agent would just earn a bigger reward the more money they earn at the end. You would also have to come up with a way of training the agent, as the stock market isn't really going to change depending on what the agent does, unlike games like Connect-4, so you would have to put it into a scenario based off past data.

I agree with you, maybe we should adapt LSTM into Muzero?

fbondeaux commented 3 years ago

The biggest problem is not the data nor modifying the environment but the computing power. Essentially you will need to train a trading brain which should outperform a human. If you only think about the encoding size that you need.

Regarding the scenario environment which @goshawk22 proposes one shouldn't get his hopes up. In a game, any game; there are specific parameters that defines the space. For trading in the financial world it's a stochastic environment so there are hardly any rules that MuZero can learn. We've seen this before with MuZero in Go; when the environment changes really quickly into unknown territory or it detects really unexpected moves, it makes mistakes. When a crash happens or an index drops 50/100 points all of a sudden then you will start to lose money rather quickly. One can however learn the model to stop trading and clear the portfolio or something like that; but then again, you need a lot of computing power to achieve such model.

Ray-0403 commented 3 years ago

The biggest problem is not the data nor modifying the environment but the computing power. Essentially you will need to train a trading brain which should outperform a human. If you only think about the encoding size that you need.

Regarding the scenario environment which @goshawk22 proposes one shouldn't get his hopes up. In a game, any game; there are specific parameters that defines the space. For trading in the financial world it's a stochastic environment so there are hardly any rules that MuZero can learn. We've seen this before with MuZero in Go; when the environment changes really quickly into unknown territory or it detects really unexpected moves, it makes mistakes. When a crash happens or an index drops 50/100 points all of a sudden then you will start to lose money rather quickly. One can however learn the model to stop trading and clear the portfolio or something like that; but then again, you need a lot of computing power to achieve such model.

Yes, you are right. Financial market itself is too stochastic to model it. One way i think is to represent the raw tick data into a less stochastic form and provide more meaningful features base on that new dataset so that Muzero maybe able to model it. As for the risk, if we are going to apply the RL in real world, we should always use risk control method outside the RL algo.

goshawk22 commented 3 years ago

Yes, you are right. Financial market itself is too stochastic to model it. One way i think is to represent the raw tick data into a less stochastic form and provide more meaningful features base on that new dataset so that Muzero maybe able to model it.

It might also make sense to use data for a small number of companies, perhaps companies in a similar sector (e.g. AMD, Intel, Nvidia etc). Then it might be easier for the agent to find patterns within it.

Ray-0403 commented 3 years ago

Yes, you are right. Financial market itself is too stochastic to model it. One way i think is to represent the raw tick data into a less stochastic form and provide more meaningful features base on that new dataset so that Muzero maybe able to model it.

It might also make sense to use data for a small number of companies, perhaps companies in a similar sector (e.g. AMD, Intel, Nvidia etc). Then it might be easier for the agent to find patterns within it.

Do you think we can replace the fully connected network with LSTM in the code and train it on some trading envs? Really want to have a try.

goshawk22 commented 3 years ago

Do you think we can replace the fully connected network with LSTM in the code and train it on some trading envs? Really want to have a try.

I wonder if a combination of LSTM and the Resnet would work? Definitely give it a go!

Ray-0403 commented 3 years ago

How about we exchanging email address so that we can discuss and do experiments in future?

goshawk22 commented 3 years ago

Yeah that would be useful - I've sent you information with my email and telegram.

ipsec commented 3 years ago

IMHO is very difficult (maybe impossible) to get the real state of the market. I'm trying this just now. I'm not getting data time based but I'm getting my state from price ranges (using renko chart idea) plus the deep of market (which shows the traders perspectives in these specific case). So, I can place orders and get rewards in specifics prices to my gym game. The problem of this approach is the deep of market data, because its data is not stored in brokers because the huge quantities of data. So I'm gethering data day by day.

After some training and testing (using muzero) I'm getting a interesting behavior. When the market change very fast in a direction the predicions have more accuracy and so the rewards too. The problem occour in stagnations where the accuracy slow down and errors are recurrents.

Now I'm trying to create a action called "hold" to no place orders when the stagnations become clear. But, for now, without much success.

Ray-0403 commented 3 years ago

IMHO is very difficult (maybe impossible) to get the real state of the market. I'm trying this just now. I'm not getting data time based but I'm getting my state from price ranges (using renko chart idea) plus the deep of market (which shows the traders perspectives in these specific case). So, I can place orders and get rewards in specifics prices to my gym game. The problem of this approach is the deep of market data, because its data is not stored in brokers because the huge quantities of data. So I'm gethering data day by day.

After some training and testing (using muzero) I'm getting a interesting behavior. When the market change very fast in a direction the predicions have more accuracy and so the rewards too. The problem occour in stagnations where the accuracy slow down and errors are recurrents.

Now I'm trying to create a action called "hold" to no place orders when the stagnations become clear. But, for now, without much success.

It is really cool! I am also thinking about using renko chart and market depth information as input to the model.

Ray-0403 commented 3 years ago

IMHO is very difficult (maybe impossible) to get the real state of the market.

I'm trying this just now.

I'm not getting data time based but I'm getting my state from price ranges (using renko chart idea) plus the deep of market (which shows the traders perspectives in these specific case). So, I can place orders and get rewards in specifics prices to my gym game.

The problem of this approach is the deep of market data, because its data is not stored in brokers because the huge quantities of data.

So I'm gethering data day by day.

After some training and testing (using muzero) I'm getting a interesting behavior. When the market change very fast in a direction the predicions have more accuracy and so the rewards too.

The problem occour in stagnations where the accuracy slow down and errors are recurrents.

Now I'm trying to create a action called "hold" to no place orders when the stagnations become clear. But, for now, without much success.

Maybe you can add some filters to the model to help it identify low volatility period.

uduse commented 3 years ago

I think the biggest problem is that we can't reliably evaluate an algorithm in this context. Research showed that trading algorithms that perform well on historical data don't necessarily perform well on future data. In fact, the good algorithms relative to the past are "good" mostly due to the survival effect, and in the long term, they only have average performance. This means even if you have an awesome algorithm for predicting the future stock market, you can't know it is actually good by looking at good numbers it recently produced, because that could be pure luck. Worse, if your awesome algorithm is in the pool of ten other average algorithms you designed, there's no way for you to confidently tell your awesome algorithm is actually awesome in a reasonably short frame of time.

vineetvermait commented 2 years ago

Hi,

reviving this discussion a bit...

intra-day trading or very short span trading like a 30 min span may be more useful in trading context as long term one may be influenced by a lot of factors but it is very much possible that the short duration trades could be more of pattern based than emotion based...

data could be of a minute granularity and ohlcv could be sufficient to derive more indicators if needed...

having said this ..how would we be able to map muzero to trading??

uduse commented 2 years ago

It's definitely doable. The main challenges would be problem formulation. i.e., how to construct the observation, how to define the reward, how do define the action space... Once we setup the environment correctly, MuZero can start searching already. However, vanilla MuZero is not good enough in stochastic environments so we might want to use something like a VQ-VAE MuZero (Vector Quantized Models for Planning).

ipsec commented 2 years ago

Stochastic Muzero is here https://openreview.net/forum?id=X6D9bAHhBQ1

liushaohuai5 commented 1 year ago

Another question is how to define a simulator in financial applications like trading? It seems that there is no way to simulate the market.

uduse commented 1 year ago

Another question is how to define a simulator in financial applications like trading? It seems that there is no way to simulate the market.

Usually you just simulate base on the history data. For simulating the future, there's no hope.

liushaohuai5 commented 1 year ago

Another question is how to define a simulator in financial applications like trading? It seems that there is no way to simulate the market.

Usually you just simulate base on the history data. For simulating the future, there's no hope.

So it seems that this is an offline RL problem? This seems totally different from the problem MuZero is trying to solve.

uduse commented 1 year ago

@liushaohuai5 Trading should not be an offline problem because the realtimeness is too important, but right now the best formulation we have is to use history data and make it an offline problem.

MuZero can learn a really good policy in offline environments too.

loafthecomputerphile commented 1 year ago

i have been working on a similar problem but instead with PPO but the environment can be placed into muzero i believe. from what i have seen simulating slippage and spread along with any fees is essential to get a more realistic model that may work offline.

additionally the way we train it on multiple stocks need to be understood also. in my project i usually load in 20 tickers of historical data along with a fixed fee cost which is reasonable for most of the tickers. i then each episode would last: num_of_tickers * num_of_bars in this way it treats trading on 20 tickers as 1 single portfolio to get the maximum reward or you can treat every ticker as its own episode but i seen better rewards in the latter.

additionally the more bars used in the feature input window the more the trading equity curves start to distinguish themselves from the buy and hold strategy which is a good sign that the the model may do better than buy and hold no matter the trend. feeding in previous model signals may also help with decision making also. bellow is a picture of one of the outputs of my PPO trading model on non training data. atm i am focusing on hourly bars unlike the picture which is on 5min bars since i can train on more tickers for more years without having episodes lasting 400k+ steps.

RL_STRAT_F