Closed greg2paris closed 5 years ago
Hey,
I would recommend reading up on the Gym environments:
as it will be the interface between the RL models of stables baselines and your real live data.
The idea is to apply the action in the step() function, calculate the reward (how good was the investment) of the action and return the current observation (in your case the forex data). Also, dont forget that your reset() also gets the latest observation and returns it.
Actualy, I created a custom environment, I would like to interact with the Interactive broker api. I installed the ib_insync and it work very well. I kind of succeded to make the code running, but in the same time I dont see any traning. It should print the action, but I dont even see it. Now I am having a headache to try to get this simple code working.
This is the current code I am working with :
from ib_insync import *
from decimal import *
from stable_baselines.common.vec_env import DummyVecEnv
from stable_baselines import PPO2
import time
from IPython.display import clear_output
import numpy as np
import pandas as pd
import gym
from gym import spaces
from gym.utils import seeding
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [3, 3]
desired_width=180
pd.set_option('display.width', desired_width)
pd.set_option('display.max_columns', 100)
def import_data(duration, barsize):
df = util.df(ib.reqHistoricalData(
contract,
endDateTime='',
durationStr=duration,
barSizeSetting=barsize,
whatToShow='MIDPOINT',
useRTH=True,
formatDate=1,
keepUpToDate=False))
df = df[['date', 'open', 'high', 'low', 'close']]
return df
def place_order_func(type_of_trade, number_of_share):
order = MarketOrder(type_of_trade, number_of_share)
trade = ib.placeOrder(contract, order)
return trade
def onTickerUpdate(ticker):
print("++++++++++++++++++++++++++++++++++++++")
try:
unrealized_pnl = ib.portfolio()[0][5]
print(ib.portfolio()) # show full portofolio
print("unrealized pnl: ", unrealized_pnl, "realized pnl: ", realized_pnl) # show portofolio
except:
unrealized_pnl = 0
print("no position opened")
print(ib.pnl())
try:
print(ib.positions())
except:
print("no positions")
bids = ticker.domBids
for i in range(number_of_lines):
df.iloc[i, 0] = bids[i].size if i < len(bids) else 0
df.iloc[i, 1] = bids[i].price if i < len(bids) else 0
asks = ticker.domAsks
for i in range(number_of_lines):
df.iloc[i, 2] = asks[i].price if i < len(asks) else 0
df.iloc[i, 3] = asks[i].size if i < len(asks) else 0
clear_output(wait=True)
df_array = np.array(df)
df_array = df_array.flatten()
df_array = np.append(df_array, unrealized_pnl)
# Create vector env
env = DummyVecEnv([lambda: Trading_Env(df_array)])
policy = 'MlpPolicy'
try:
model
except NameError:
model_exists = False
else:
model_exists = True
if not model_exists:
model = PPO2(policy, env, verbose=0)
if model_exists:
print("Same Model")
print("____________________________________________Training Phase____________________________________________")
print("")
print(model)
print("Learning Rate: ", model.learning_rate)
start_time = time.time()
# Train the agent for 10000 steps
model.learn(total_timesteps=1)
print("environment_times_steps", 1)
print("Training Phase phase runned for --- %s seconds ---" % (time.time() - start_time))
print("Average time for one timestep :", ((time.time() - start_time) / 1))
model.save("model_ppo2")
class Trading_Env(gym.Env):
metadata = {'render.modes': ['human']}
def __init__(self, df_array):
super(Trading_Env, self).__init__()
self.state = df_array
self.df_array_shape = self.state.shape
# defines action space
self.action_space = spaces.Discrete(3) # trade, tp, sl
self.observation_space = spaces.Box(low=-np.inf, high=np.inf, shape=self.df_array_shape, dtype=np.float32)
def render(self, mode='human', verbose=False):
return None
def seed(self, seed=None):
self.np_random, seed = seeding.np_random(seed)
return [7]
def step(self, action):
print("action: ", action)
if action == 1:
trade = place_order_func('BUY', 20000) # 'BUY' 'SELL'
elif action == 2:
trade = place_order_func('SELL', 20000) # 'BUY' 'SELL'
self.done = False
info = {}
return self.state, self.reward, self.done, info # ,self.EURUSD_n_long, self.EURUSD_n_short
def reset(self):
return self.state
# create interactive broker connection
ib = IB()
ib.connect('127.0.0.1', 7497, clientId=16)
l = ib.reqMktDepthExchanges()
l[:5]
contract = Forex('EURUSD')
ib.qualifyContracts(contract)
ticker = ib.reqMktDepth(contract)
number_of_lines = 10
df = pd.DataFrame(index=range(number_of_lines), columns='bidSize bidPrice askPrice askSize'.split())
for ii in range(100):
ticker.updateEvent += onTickerUpdate
IB.sleep(15)
Do you have any advise to make it actualy work?
Ah, I'm sorry I dont think I can help with that. I dont know anything about trading/broker APIs.
If you manage to get your environment working, it should be relatively simple to interface with stable-baselines however: https://stable-baselines.readthedocs.io/en/master/guide/quickstart.html
ok, actually, I have my data updated with this line:
ticker.updateEvent += onTickerUpdate
Every time their is a new market event, the function onTickerUpdate is called. I then create the data in a numpy array, and then create the env with the new numpy array:
# Create vector env
env = DummyVecEnv([lambda: Trading_Env(df_array)])
policy = 'MlpPolicy'
Then I train for one time step since its in real time.
Am I getting in the right way on doing this ?
Or do you think I should update the data directly inside the environment itself?
Am I getting in the right way on doing this ? Or do you think I should update the data directly inside the environment itself?
I dont understand what you are doing or want to do, it is not my field of expertice. We also do not do tech support. I'm afraid you are going to have to solve the environment your self.
If you have any issues with the RL models them selfs, dont hesitate to ask questions or submit a bug issues if you think you have found a bug.
Is is possible to include real live data to train the agent? I would like to use real time forex data to train an agent in real time. How can I do this? Should I create a dummy environment every timesteps? DummyVecEnv([lambda: custom_env()])