Using deep actor-critic model to learn best strategies in pair trading
Partially observed Markov decision process problem of pairs trading is a challenging aspect in algorithmic trading. In this work, we tackle this by utilizing a deep reinforcement learning algorithm called advantage actor-critic by extending the policy network with a critic network, to incorporate both the stochastic policy gradient and value gradient. We have also used recurrent neural network coupled with long-short term memory to preserve information from time series data of stock market. A memory buffer for experience replay and a target network are also employed to reduce the variance from noisy and correlated environment. Our results demonstrate a success on learning a well-performing lucrative model by directly taking data from public available sources and present possibilities for extensions to other time-sensitive applications
customize the stock pair/period to simulate in runner.py
run "python RLMDP/runner.py"
Yichen Shen Yiding Zhao
Su Hang Zhaoming Wu Sam Norris