mjuchli / ctc-executioner

Master Thesis: Limit order placement with Reinforcement Learning
176 stars 83 forks source link

[RL] Improve reward function #11

Open mjuchli opened 6 years ago

mjuchli commented 6 years ago

Instead of (p_0 - vwap_t) compare against p_0 - (max([p_0; p_t]) + min([p_0; p_t])) / 2 (normalized between -1 and 1). Therefore we have a stable reward for any kind of fluctuation.

evernote snapshot 20180310 225733