junyoung-sim / quant

Generalized Deep Reinforcement Learning for Trading
https://doi.org/10.47611/jsrhs.v12i1.4316
6 stars 1 forks source link

Is the time for 'build' and 'test' almost the same? Does the time of 'run' also overlap? #1

Closed yigaza closed 1 year ago

junyoung-sim commented 1 year ago

Build time depends on the number of tickers, the time frame of the historical data of each ticker, and your computer specs. When training on the top 100 holdings of the S&P 500, there are approximately 400,000 states in the entire dataset (about 4,000 from each stock) and thus the replay memory would have a capacity near 40,000 experiences (10% of entire dataset). Once the replay memory reaches its capacity, 10 experiences are randomly sampled to update the agent network of the model via stochastic gradient descent after every new experience is added and the oldest experience is deleted from the replay memory. This will take a considerable amount of time (about 32-36 hours on my Dell XPS w/ i7-12700H and 32GB RAM).

Test time also depends on the same factors, but it takes only about 30 minutes to go through the top 100 holdings of the S&P 500 from 2006 to present.

Run time also depends on the same factors, but it takes only 5 minutes to go through the top 100 holdings of the S&P 500 from 2006 to present.

yigaza commented 1 year ago

Thank very much for your replay. Sorry, my expression is not clear enough. I mean that these three times overlap, which may lead to overfitting, resulting in too good results. If these three times do not overlap during my small-scale testing, the Sharpe rate will sharply decrease.

junyoung-sim commented 1 year ago

In the context of this application, the model suffers overfitting if it is specialized to a limited number of stocks and unable to generalize its performance on similar stocks it was not trained on. Thus, time overlaps could be tolerated as long as the stocks used to train and test the model are different (for instance, the research involved in this work trained a model on the top 50 holdings of the S&P 500 and tested on the top 100 holdings, showing similar performance for both trained and untrained stocks). This is because (1) deep reinforcement learning is a process of trial-and-error where the agent is tested and trained at the same time after every experience and (2) market dynamics represented by the multivariate state space in this algorithm (some stock X, SPY, IEF, EUR=X, GSG) is relatively constant compared to looking at the price data of one particular security. Simply put, the relationship between the overall stock market, bonds, currency exchange rates, and commodities are less variable throughout all time periods compared to when just looking at the price data of one security that may vary more throughout its history (which is a major advantage of this model's state space). You may find previous works that partition their dataset by historical time period, which is highly necessary when the state space only consists of the price data of the asset it intends to trade.

In addition, based on my experience in testing this model, small-scale testing may not yield the results you hope for. A small dataset would mean a smaller replay memory capacity along with a relatively quick decay in the learning rate used for SGD that may result in a weak model for both trained and untrained stocks.

I hope this answers your question!

yigaza commented 1 year ago

Yes, sir. I test more stocks.