This repo contains
keras-rl
(https://github.com/keras-rl/keras-rl) Agent is expected to learn useful action sequences to maximize profit in a given environment.
Environment limits agent to either buy, sell, hold stock(coin) at each step.
If an agent decides to take a
buy- hold- hold- sell
sell - hold -hold -buy
. Only a single position can be opened per trade.
buy - buy
will be considered buy- hold
. Reward is given
This type of sparse reward granting scheme takes longer to train but is most successful at learning long term dependencies.
Agent decides optimal action by observing its environment.
(window_size, n_features)
. With some modification it can easily be applied to stocks, futures or foregin exchange as well.
Visualization / Main / Environment
Sample data provided is 5min ohlcv candle fetched from bitmex.
'./data/train/
70000'./data/train/
16000keras-rl, numpy, tensorflow ... etc
pip install -r requirements.txt
# change "keras-rl/core.py" to "./modified/core.py"
# create environment
# OPTIONS
ENV_NAME = 'OHLCV-v0'
TIME_STEP = 30
PATH_TRAIN = "./data/train/"
PATH_TEST = "./data/test/"
env = OhlcvEnv(TIME_STEP, path=PATH_TRAIN)
env_test = OhlcvEnv(TIME_STEP, path=PATH_TEST)
# random seed
np.random.seed(123)
env.seed(123)
# create_model
nb_actions = env.action_space.n
model = create_model(shape=env.shape, nb_actions=nb_actions)
print(model.summary())
# create memory
memory = SequentialMemory(limit=50000, window_length=TIME_STEP)
# create policy
policy = EpsGreedyQPolicy()# policy = BoltzmannQPolicy()
# create agent
# you can specify the dueling_type to one of {'avg','max','naive'}
dqn = DQNAgent(model=model, nb_actions=nb_actions, memory=memory, nb_steps_warmup=200,
enable_dueling_network=True, dueling_type='avg', target_model_update=1e-2, policy=policy,
processor=NormalizerProcessor())
dqn.compile(Adam(lr=1e-3), metrics=['mae'])
# now train and test agent
while True:
# train
dqn.fit(env, nb_steps=5500, nb_max_episode_steps=10000, visualize=False, verbose=2)
try:
# validate
info = dqn.test(env_test, nb_episodes=1, visualize=False)
n_long, n_short, total_reward, portfolio = info['n_trades']['long'], info['n_trades']['short'], info[
'total_reward'], int(info['portfolio'])
np.array([info]).dump(
'./info/duel_dqn_{0}_weights_{1}LS_{2}_{3}_{4}.info'.format(ENV_NAME, portfolio, n_long, n_short,
total_reward))
dqn.save_weights(
'./model/duel_dqn_{0}_weights_{1}LS_{2}_{3}_{4}.h5f'.format(ENV_NAME, portfolio, n_long, n_short,
total_reward),
overwrite=True)
except KeyboardInterrupt:
continue
## simply plug in any keras model :)
def create_model(shape, nb_actions):
model = Sequential()
model.add(CuDNNLSTM(64, input_shape=shape, return_sequences=True))
model.add(CuDNNLSTM(64))
model.add(Dense(32))
model.add(Activation('relu'))
model.add(Dense(nb_actions, activation='linear'))
[Verbose] While training or testing,
[Portfolio]
[Reward]
Wow ! 29 fold return, 3.67 reward !
! Disclaimer : if may have overfitted :(
This project is licensed under the MIT License - see the LICENSE.md file for details