notadamking / RLTrader

A cryptocurrency trading environment using deep reinforcement learning and OpenAI's gym
https://discord.gg/ZZ7BGWh
GNU General Public License v3.0
1.73k stars 540 forks source link

Debugging the sortino optimizer, sortino rate not always calculated. #59

Closed robinvanleeuwen closed 5 years ago

robinvanleeuwen commented 5 years ago

I am trying to get a grasp on how the optimizer works and which values are given/calculated etc. I Added some debugging, and see that the sortino rate is not always calculated. This happens at least when the first couple of steps are non-trades. See log below.

I also have a question about the rate: the code specifies that only finite rates are measured. That is NaN and "infinite". NaN i assume the rate cannot be calculated. But infinite is also set to zero. Is this correct? What does an calculated "infinite" sortino score mean?

` legend: --- = sale

+++ = buy

b = balance

nw = net worth

CS = calculating Sortino Rate

reward = sortino rate, total_reward, (length = min(self.current_step, self.forecast_len) `

DEBUG:main:Trainng length INFO:main: Reward strategy : sortino INFO:main: Input data file : data/small_corrected_hourly.csv INFO:main: Parames db file : sqlite:///params.db INFO:main: n_jobs = 1 INFO:main: n_trials = 30 INFO:main: n_test_episodes = 3 INFO:main: n_evaluations = 4 INFO:main: Total number of records : 2612 INFO:main: Number of training recs : 2612 INFO:main: Num. of evaluation recs : 2089 INFO:main:forecast_length = 2 INFO:main:convidence_intv = 0.9407857951645567 INFO:main:n_steps = 82 INFO:main:gamma = 0.9910742592048303 INFO:main:learning_rate = 0.0001463509003510531 INFO:main:ent_coef = 8.4767762718177e-08 INFO:main:cliprange = 0.9950675592874643 INFO:main:noptepochs = 1 INFO:main:lam = 0.9304853592231145

DEBUG:env.BitcoinTradingEnv: b = 10000.00 nw = 10000.00 DEBUG:env.BitcoinTradingEnv:

DEBUG:env.BitcoinTradingEnv: b = 10000.00 nw = 10000.00 DEBUG:env.BitcoinTradingEnv:

DEBUG:env.BitcoinTradingEnv:--- $$$ 0.00 DEBUG:env.BitcoinTradingEnv: b = 10000.00 nw = 10000.00 DEBUG:env.BitcoinTradingEnv:

DEBUG:env.BitcoinTradingEnv: b = 10000.00 nw = 10000.00 DEBUG:env.BitcoinTradingEnv:

DEBUG:env.BitcoinTradingEnv:--- $$$ 0.00 DEBUG:env.BitcoinTradingEnv: b = 10000.00 nw = 10000.00 DEBUG:env.BitcoinTradingEnv:

DEBUG:env.BitcoinTradingEnv:--- $$$ 0.00 DEBUG:env.BitcoinTradingEnv: b = 10000.00 nw = 10000.00 DEBUG:env.BitcoinTradingEnv:

DEBUG:env.BitcoinTradingEnv: b = 10000.00 nw = 10000.00 DEBUG:env.BitcoinTradingEnv:

DEBUG:env.BitcoinTradingEnv:+++ $$$ 3333.33 DEBUG:env.BitcoinTradingEnv: b = 6666.67 nw = 6666.67 DEBUG:env.BitcoinTradingEnv:CS DEBUG:env.BitcoinTradingEnv: reward = 0 0 (2) DEBUG:env.BitcoinTradingEnv:

DEBUG:env.BitcoinTradingEnv:+++ $$$ 2222.22 DEBUG:env.BitcoinTradingEnv: b = 4444.44 nw = 4444.44 DEBUG:env.BitcoinTradingEnv:CS DEBUG:env.BitcoinTradingEnv: reward = 0 0 (2) DEBUG:env.BitcoinTradingEnv:

DEBUG:env.BitcoinTradingEnv:--- $$$ 5496.85 DEBUG:env.BitcoinTradingEnv: b = 9941.29 nw = 9941.29 DEBUG:env.BitcoinTradingEnv:CS DEBUG:env.BitcoinTradingEnv: reward = 0 0 (2) DEBUG:env.BitcoinTradingEnv:

DEBUG:env.BitcoinTradingEnv:+++ $$$ 2485.32 DEBUG:env.BitcoinTradingEnv: b = 7455.97 nw = 7455.97 DEBUG:env.BitcoinTradingEnv:CS DEBUG:env.BitcoinTradingEnv: reward = 0 0 (2) DEBUG:env.BitcoinTradingEnv:

`

evanatyourservice commented 5 years ago

Just add an "epsilon" to the denominator...

def sortino(self, returns):
    target = etc... # target returns

    numerator = np.mean(returns) - target

    returns = returns - target
    returns[returns > 0.0] = 0.0
    denominator = np.sqrt(np.mean(np.square(returns))) + 1.0e-8  # no divide by zero

    return numerator / denominator

Then it would be wise to normalize it somehow in the step function before it is sent to the algo, like taking the cube root or something. reward = self.sortino(returns)... reward = np.cbrt(reward)... ayyy