Open mjuchli opened 6 years ago
Instead of (p_0 - vwap_t) compare against p_0 - (max([p_0; p_t]) + min([p_0; p_t])) / 2 (normalized between -1 and 1). Therefore we have a stable reward for any kind of fluctuation.
(p_0 - vwap_t)
p_0 - (max([p_0; p_t]) + min([p_0; p_t])) / 2
Instead of
(p_0 - vwap_t)
compare againstp_0 - (max([p_0; p_t]) + min([p_0; p_t])) / 2
(normalized between -1 and 1). Therefore we have a stable reward for any kind of fluctuation.