hansbuehler / deephedging

Implementation of the vanilla Deep Hedging engine
GNU General Public License v3.0
233 stars 45 forks source link
deep-hedging hedging option-trading

Deep Hedging

Reinforcement Learning for Hedging Derviatives under Market Frictions

This archive contains a sample implementation of of the Deep Hedging framework. The purpose of the code base is to illustrate the concepts behind Deep Hedging. The code is not optimized for speed. Any production use will require additional safeguards before use.

The notebook directory has a number of examples on how to use it.

Latest major updates:



Beta version. Please report any issues. Please see installation support below.

Deep Hedging

The Deep Hedging problem for a horizon $T$ hedged over $M$ time steps with $N$ hedging instruments is finding an optimal action function $a$ as a function of feature states $s0,\ldots,s{T-1}$ which solves

$$ \sup_a:\ \mathrm{U}\left[\ ZT + \sum{t=0}^{T-1} a(s_t) \cdot DH_t + \gamma_t \cdot | a(s_t) H_t | \ \right] \ . $$

Here

To test the code run notebooks/trainer.ipynb.

In order to run the Deep Hedging, we require:

To provide your own world with real or simulator data, see world.py. Here are world.tf_data entries used by gym.call():

An example world generator for simplistic model dynamics is provided, but in practise it is recommend to rely on fully machine learned market simulators such as https://arxiv.org/abs/2112.06823

Installation

See requirements.txt for latest version requirements. At the time of writing this markdown these are

Anaconda

In your local conda environment use the following:

    conda install "cdxbasics>=0.2.11" -c hansbuehler
    conda install -c conda-forge tensorflow>=2.10
    conda install -c conda-forge tensorflow-probability==0.14
    conda install cvxpy

At the time of writing, this gives you TensowFlow 2.10 and the correct tensorflow-probability version 0.14. Then check that the following works:

    import tensorflow as tf
    import tensorflow_probability as tfp # ensure this does not fail
    print("TF version %s. Num GPUs Available: %ld" % (tf.__version__, len(tf.config.list_physical_devices('GPU')) ))  # should give you the tenosr flow version and whether it found any GPUs

AWS SageMaker

(29/1/2023) Finally AWS SageMaker supports TensorFlow 2.10 with and without GPU with the conda environment conda_tensorflow2_p310. It is still pretty buggy (e.g. conda is inconsistent out of the box) but seems to work. AWS tends to change their available conda packages, so check which one is available when you are trying this.

In order to run Deep Hedging, launch a decent AWS SageMaker instance such as ml.c5.2xlarge. Open a terminal and write:

    bash
    conda activate tensorflow2_p310
    python -m pip install --upgrade pip
    pip install cdxbasics tensorflow_probability==0.14 cvxpy

The reason we are using pip here an not conda is that conda_tensorflow2_p310 is inconsistent, so using conda is pretty unreliable and very slow. Either way, above should give you an environemnt with Tensorflow 2.10, including with GPU support if your selected instance has GPUs.

If you have cloned the Deep Hedging git directory via SageMaker, then the deephedging directory is not in your include path, even if the directory shows up in your jupyter hub file list. Don't ask... That is why I've added some magic code on top of the various noteooks:

import os
p = os.getcwd()
dhn = "/deephedging"
i = p.find(dhn)
if i!=-1:
    p = p[:i]
    import sys
    sys.path.append(p)
    print("SageMaker: added python path %s" % p)

GPU support

In order to run on GPU you must have installed the correct CUDA and cuDNN drivers, see here. This seems to have been done on AWS. Once you have identified the correct drivers, use

    conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1

The notebooks in the git directory will print the number of available CPUs and GPUs.

The latest version of Deep Hedging will benefit quite a bit from the presence of a GPU if large batch sizes are used. For example notebooks/trainer-recurrent-fwdstart.ipynb experiences a speed up from just above 1 hour to less than 30mins on a p3.2xlarge as opposed to a c5.4xlarge.

Industrial Machine Learning Code Philosophy

We attempted to provide a base for industrial code development.

Key Objects and Functions

Interpreting Progress Graphs

Here is an example of progress information printed by NotebookMonitor:

progess example

The graphs show:

Text information:

Running Deep Hedging

Copied from notebooks/trainer.ipynb:

print("Deep Hedging AI says hello  ... ", end='')
from cdxbasics.config import Config
from deephedging.trainer import train
from deephedging.gym import VanillaDeepHedgingGym
from deephedging.world import SimpleWorld_Spot_ATM

from IPython.display import display, Markdown

# see print of the config below for numerous options
config = Config()
# world
config.world.samples = 10000
config.world.steps = 20
config.world.black_scholes = True
# gym
config.gym.objective.utility = "cvar"
config.gym.objective.lmbda = 1.
config.gym.agent.network.depth = 3
config.gym.agent.network.activation = "softplus"
# trainer
config.trainer.train.optimizer.name = "adam"
config.trainer.train.optimizer.learning_rate = 0.001
config.trainer.train.optimizer.clipvalue = 1.
config.trainer.train.optimizer.global_clipnorm = 1.
config.trainer.train.batch_size = None
config.trainer.train.epochs = 400
config.trainer.visual.epoch_refresh = 1
config.trainer.visual.time_refresh = 10
config.trainer.visual.confidence_pcnt_lo = 0.25
config.trainer.visual.confidence_pcnt_hi = 0.75

display(Markdown("## Deep Hedging in a Black \& Scholes World"))

# create world
world      = SimpleWorld_Spot_ATM( config.world )              # training set
val_world  = world.clone(samples=config.world("samples")//2)   # validation set

# create training environment
gym = VanillaDeepHedgingGym( config.gym )

# create training environment
train( gym=gym, world=world, val_world=val_world, config=config.trainer )

# read result of trained model
r = gym(world.tf_data)
print("Keys of the dictionary returned by the gym: ", r.keys())

print("=========================================")
print("Config usage report")
print("=========================================")
print( config.usage_report() )
config.done()

Below is an example output print( config.usage_report() ). It provides a summary of all config values available, their defaults, and what values where used. The actual outpout may differ if you run a later version of the Deep Hedging code base.

=========================================
Config usage report
=========================================
config.gym.agent.init_delta.network['activation'] = relu # Network activation function; default: relu
config.gym.agent.init_delta.network['depth'] = 1 # Network depth; default: 1
config.gym.agent.init_delta.network['final_activation'] = linear # Network activation function for the last layer; default: linear
config.gym.agent.init_delta.network['width'] = 1 # Network width; default: 1
config.gym.agent.init_delta.network['zero_model'] = False # Create a model with zero initial value, but randomized initial gradients; default: False
config.gym.agent.init_delta['active'] = True # Whether or not to train in addition a delta layer for the first step; default: True
config.gym.agent.init_delta['features'] = [] # Named features for the agent to use for the initial delta network; default: []
config.gym.agent.network['activation'] = softplus # Network activation function; default: relu
config.gym.agent.network['depth'] = 3 # Network depth; default: 3
config.gym.agent.network['final_activation'] = linear # Network activation function for the last layer; default: linear
config.gym.agent.network['width'] = 20 # Network width; default: 20
config.gym.agent.network['zero_model'] = False # Create a model with zero initial value, but randomized initial gradients; default: False
config.gym.agent.state['features'] = [] # Named features for the agent to use for the initial state network; default: []
config.gym.agent['agent_type'] = feed_forward # Which network agent type to use; default: feed_forward
config.gym.agent['features'] = ['price', 'delta', 'time_left'] # Named features for the agent to use; default: ['price', 'delta', 'time_left']
config.gym.agent['recurrence'] = 0 # Number of real recurrent states. Set to zero to turn off recurrence; default: 0
config.gym.agent['recurrence01'] = 0 # Number of digital recurrent states. Set to zero to turn off recurrence; default: 0
config.gym.environment['hard_clip'] = False # Use min/max instread of soft clip for limiting actions by their bounds; default: False
config.gym.environment['outer_clip'] = True # Apply a hard clip 'outer_clip_cut_off' times the boundaries; default: True
config.gym.environment['outer_clip_cut_off'] = 10.0 # Multiplier on bounds for outer_clip; default: 10.0
config.gym.environment['softclip_hinge_softness'] = 1.0 # Specifies softness of bounding actions between lbnd_a and ubnd_a; default: 1.0
config.gym.objective.y.network['activation'] = relu # Network activation function; default: relu
config.gym.objective.y.network['depth'] = 3 # Network depth; default: 3
config.gym.objective.y.network['final_activation'] = linear # Network activation function for the last layer; default: linear
config.gym.objective.y.network['width'] = 20 # Network width; default: 20
config.gym.objective.y.network['zero_model'] = False # Create a model with zero initial value, but randomized initial gradients; default: False
config.gym.objective.y['features'] = [] # Path-wise features used to define 'y'. If left empty, then 'y' becomes a simple variable; default: []
config.gym.objective['lmbda'] = 1.0 # Risk aversion; default: 1.0
config.gym.objective['utility'] = cvar # Type of monetary utility; default: exp2
config.gym.tensorflow['seed'] = 423423423 # Set tensor random seed. Leave to None if not desired; default: 423423423
config.trainer.caching['debug_file_name'] = None # Allows overwriting the filename for debugging an explicit cached state; default: None
config.trainer.caching['directory'] = ./.deephedging_cache # If specified, will use the directory to store a persistence file for the model; default: ./.deephedging_cache
config.trainer.caching['epoch_freq'] = 10 # How often to cache results, in number of epochs; default: 10
config.trainer.caching['mode'] = on # Caching strategy: 'on' for standard caching; 'off' to turn off; 'update' to overwrite any existing cache; 'clear' to clear existing caches; 'readonly' to read existing caches but not write new ones; default: on
config.trainer.debug['check_numerics'] = False # Whether to check numerics; default: False
config.trainer.train.optimizer['amsgrad'] = False # Parameter amsgrad for <class 'keras.optimizers.optimizer_v2.adam.Adam'>; default: False
config.trainer.train.optimizer['beta_1'] = 0.9 # Parameter beta_1 for <class 'keras.optimizers.optimizer_v2.adam.Adam'>; default: 0.9
config.trainer.train.optimizer['beta_2'] = 0.999 # Parameter beta_2 for <class 'keras.optimizers.optimizer_v2.adam.Adam'>; default: 0.999
config.trainer.train.optimizer['clipnorm'] = None # Parameter clipnorm for keras optimizers; default: None
config.trainer.train.optimizer['clipvalue'] = None # Parameter clipvalue for keras optimizers; default: None
config.trainer.train.optimizer['epsilon'] = 1e-07 # Parameter epsilon for <class 'keras.optimizers.optimizer_v2.adam.Adam'>; default: 1e-07
config.trainer.train.optimizer['global_clipnorm'] = None # Parameter global_clipnorm for keras optimizers; default: None
config.trainer.train.optimizer['learning_rate'] = 0.001 # Parameter learning_rate for <class 'keras.optimizers.optimizer_v2.adam.Adam'>; default: 0.001
config.trainer.train.optimizer['name'] = adam # Optimizer name. See https://www.tensorflow.org/api_docs/python/tf/keras/optimizers; default: adam
config.trainer.train.tensor_board['hist_freq'] = 1 # Specify tensor board log frequency; default: 1
config.trainer.train.tensor_board['log_dir'] =  # Specify tensor board log directory
config.trainer.train.tensor_board['profile_batch'] = 0 # Batch used for profiling. Set to non-zero to activate profiling; default: 0
config.trainer.train['batch_size'] = None # Batch size; default: None
config.trainer.train['epochs'] = 800 # Epochs; default: 100
config.trainer.train['learing_rate'] = None # Manually set the learning rate of the optimizer; default: None
config.trainer.train['run_eagerly'] = False # Keras model run_eagerly. Turn to True for debugging. This slows down training. Use None for default; default: False
config.trainer.train['tf_verbose'] = 0 # Verbosity for TensorFlow fit(); default: 0
config.trainer.visual.fig['col_nums'] = 6 # Number of columbs; default: 6
config.trainer.visual.fig['col_size'] = 5 # Plot size of a column; default: 5
config.trainer.visual.fig['row_size'] = 5 # Plot size of a row; default: 5
config.trainer.visual['bins'] = 100 # How many x to plot; default: 100
config.trainer.visual['confidence_pcnt_hi'] = 0.75 # Upper percentile for confidence intervals; default: 0.5
config.trainer.visual['confidence_pcnt_lo'] = 0.25 # Lower percentile for confidence intervals; default: 0.5
config.trainer.visual['epoch_refresh'] = 5 # Epoch fefresh frequency for visualizations; default: 10
config.trainer.visual['err_dev'] = 1.0 # How many standard errors to add to loss to assess best performance; default: 1.0
config.trainer.visual['lookback_window'] = 200 # Lookback window for determining y min/max in graphs; default: 200
config.trainer.visual['show_epochs'] = 100 # Maximum epochs displayed; default: 100
config.trainer.visual['time_slices'] = 10 # How many slice of spot action and delta to print; default: 10
config.trainer['output_level'] = all # What to print during training; default: all
config.world['black_scholes'] = True # Hard overwrite to use a black & scholes model with vol 'rvol' and drift 'drift'. Also turns off the option as a tradable instrument by setting strike = 0; default: False
config.world['corr_ms'] = 0.5 # Correlation between the asset and its mean; default: 0.5
config.world['corr_vi'] = 0.8 # Correlation between the implied vol and the asset volatility; default: 0.8
config.world['corr_vs'] = -0.7 # Correlation between the asset and its volatility; default: -0.7
config.world['cost_p'] = 0.0005 # Trading cost for the option on top of delta and vega cost; default: 0.0005
config.world['cost_s'] = 0.0002 # Trading cost spot; default: 0.0002
config.world['cost_v'] = 0.02 # Trading cost vega; default: 0.02
config.world['drift'] = 0.1 # Mean drift of the asset. This is the total drift; default: 0.1
config.world['drift_vol'] = 0.1 # Vol of the drift; default: 0.1
config.world['dt'] = 0.02 # Time per timestep; default: One week (1/50)
config.world['invar_steps'] = 5 # Number of steps ahead to sample from invariant distribution; default: 5
config.world['ivol'] = 0.2 # Initial implied volatility; default: Same as realized vol
config.world['lbnd_as'] = -5.0 # Lower bound for the number of shares traded at each time step; default: -5.0
config.world['lbnd_av'] = -5.0 # Lower bound for the number of options traded at each time step; default: -5.0
config.world['meanrev_drift'] = 1.0 # Mean reversion of the drift of the asset; default: 1.0
config.world['meanrev_ivol'] = 0.1 # Mean reversion for implied vol vol vs initial level; default: 0.1
config.world['meanrev_rvol'] = 2.0 # Mean reversion for realized vol vs implied vol; default: 2.0
config.world['no_stoch_drift'] = False # If true, turns off the stochastic drift of the asset, by setting meanrev_drift = 0. and drift_vol = 0; default: False
config.world['no_stoch_vol'] = False # If true, turns off stochastic realized and implied vol, by setting meanrev_*vol = 0 and volvol_*vol = 0; default: False
config.world['payoff'] = atmcall # Payoff function with parameter spots[samples,steps+1]. Can be a function which must return a vector [samples]. Can also be short 'atmcall' or short 'atmput', or a fixed numnber. The default is 'atmcall' which is a short call with strike 1: '- np.maximum( spots[:,-1] - 1, 0. )'. A short forward starting ATM call is given as '- np.maximum( spots[:,-1] - spots[:,0], 0. )'; default: atmcall
config.world['rcorr_vs'] = -0.5 # Residual correlation between the asset and its implied volatility; default: -0.5
config.world['rvol'] = 0.2 # Initial realized volatility; default: 0.2
config.world['samples'] = 10000 # Number of samples; default: 1000
config.world['seed'] = 2312414312 # Random seed; default: 2312414312
config.world['steps'] = 20 # Number of time steps; default: 10
config.world['strike'] = 1.0 # Relative strike. Set to zero to turn off option; default: 1.0
config.world['ttm_steps'] = 4 # Time to maturity of the option; in steps; default: 4
config.world['ubnd_as'] = 5.0 # Upper bound for the number of shares traded at each time step; default: 5.0
config.world['ubnd_av'] = 5.0 # Upper bound for the number of options traded at each time step; default: 5.0
config.world['volvol_ivol'] = 0.5 # Vol of Vol for implied vol; default: 0.5
config.world['volvol_rvol'] = 0.5 # Vol of Vol for realized vol; default: 0.5

Misc Code Overview

Core training:

World generator

Networks

Tools