lefnire / tforce_btc_trader

TensorForce Bitcoin Trading Bot
http://ocdevel.com/podcasts/machine-learning/26
GNU Affero General Public License v3.0
816 stars 235 forks source link

Actions exploration #38

Open methenol opened 6 years ago

methenol commented 6 years ago

I'm working outside of hypersearch right now so these are probably not ideal parameters. It seems the model becomes a little more flexible to less than perfect parameters (and the random associated with the model's initial state) with actions exploration defined.

https://reinforce.io/blog/introduction-to-tensorforce/ actions_exploration=dict( type='ornstein_uhlenbeck', sigma=0.1, mu=0.0, theta=0.1 ), these parameters are from the example in the above link and are not optimized

Any benefit to adding parameters for actions exploration to hypersearch?

methenol commented 6 years ago

Testing this modification to hypersearch.py, had to clear the runs database so it's going to be a bit before I can tell if it affected anything.

hypers['agent'] = {
    # 'states_preprocessing': None,
    # 'actions_exploration': None,

    'actions_exploration.type':'ornstein_uhlenbeck',
    'actions_exploration.sigma': {
        'type': 'bounded',
        'vals': [0., 1.],
        'guess': .2,
        'hydrate': min_threshold(.05, None)
    },
    'actions_exploration.mu':{
        'type': 'bounded',
        'vals': [0., 1.],
        'guess': .2,
        'hydrate': min_threshold(.05, None)
    },
    'actions_exploration.theta':{
        'type': 'bounded',
        'vals': [0., 1.],
        'guess': .2,
        'hydrate': min_threshold(.05, None)
    },
    # 'reward_preprocessing': None,

    # I'm pretty sure we don't want to experiment any less than .99 for non-terminal reward-types (which are 1.0).
    # .99^500 ~= .6%, so looses value sooner than makes sense for our trading horizon. A trade now could effect
    # something 2-5k steps later. So .999 is more like it (5k steps ~= .6%)
    'discount': 1.,  # {
    #     'type': 'bounded',
    #     'vals': [.9, .99],
    #     'guess': .97
    # },
}

First time tweaking the hypers, if there's a better way let me know.

UPDATE 08/14/18: The above code is not compatible with v0.2 as-is. The ranges to be searched are valid but the syntax is not compatible with the hyperopt implementation in v0.2.

methenol commented 6 years ago

This is able to run for v0.2: Would like for it to toggle on/off like the baseline section, working on that.

    'actions_exploration': {
        'type': 'ornstein_uhlenbeck',
        'sigma': hp.quniform('exploration.sigma', 0, 1, 0.05),
        'mu': hp.quniform('exploration.mu', 0, 1, 0.05),
        'theta':hp.quniform('exploration.theta', 0, 1, 0.05)
        },

Updated 08/19/18 to use use quniform

A brief explanation of the parmaters from here: https://www.maplesoft.com/support/help/maple/view.aspx?path=Finance%2FOrnsteinUhlenbeckProcess The parameter theta is the speed of mean-reversion. The parameter mu is the long-running mean. The parameter sigma is the volatility.

lefnire commented 6 years ago

Feel free to add in a pull request, or even just commit to master if you feel confident about it

methenol commented 6 years ago

Going to try and get the values to a little more realistic first before submitting a PR for it. Letting the hypersearch run for a bit so it does it's thing.