Open musiclicn opened 6 years ago
MileStone:
reward function:
apply latest advanced RL algorithm https://blog.openai.com/baselines-acktr-a2c/
Reward function = average sharp ratio (or volatility) of next N period reward +1 when sharp ratio increase; -1 when sharp ratio decrease
observation contains position level, past N prices and indicators
input: state: past N days prices including OHLC, 5 MA, 10 MA, 20 MA, 50 MA and 200 MA (including position) dollar amount action: NA (no action), buy 1/3, buy 2/3, buy all, sell 1/3, sell 2/3, sell all reward: discounted future 5 day PNL
Notes: use CNN for policy function/ network without pooling