Alpha-gfn
: Mining formulaic alpha factors with generative flow networksIn this repo, we build an application that leverages a deep reinforcement learning framework to mine formulaic alpha factors using generative flow network models (i.e., GFlowNet). Due to industrial NDA, this repo only serves for demonstration purposes; hence, it only includes a simple example using a small amount of training data.
We give a brief introduction on the fundamental components of the project by answering the following questions.
In quantitative investment, formulaic alpha factors are mathematical expressions or formulas used to identify and potentially exploit patterns or signals in financial data, for example, US stocks in this demo. These factors are typically derived from historical market data and are used to generate investment signals for trading strategies. Here are some common characteristics of formulaic alpha factors:
In this project, we hope to develop an algorithm that generates assorted alpha factors that performs well in predicting the stock trend. In the most ideal case, they shall provide inspirations to alpha factor researchers or even be part of the trading strategies directly.
We define an alpha factor $f$ as a function mapping feature vectors of all stocks on a trading day $X_t$ into alpha values $z_t = f(X_t)$. Examples of formulaic alpha factors include moving averages, momentum indicators, relative strength indexes, price-to-earnings ratios, and various technical and fundamental indicators, etc. However, in this demo, we only consider technical indicators and our search space consists of daily frequency market data only, such as open and close price (see details in Methodology).
A common measure of effectiveness of an alpha factor is the (absolute) information correlation (IC) between the stock trend it aims to predict $y_t$ and the factor values $f(X_t)$, which is usually defined as the Spearman correlation of $y_t$ and $f(X_t)$.
Following is a non-exaustive list of important concepts in reinforcement learning, which are used in the project and will be mentioned in the rest of this introduction.
A GFlowNet is a trained stochastic policy or generative model, trained such that it samples objects $x$ through a sequence of constructive steps (i.e., actions), with probability proportional to a reward function $R(x)$, where $R$ is a non-negative integrable function. After proper training sessions, a GFlowNet is expected to be able to sample a diversity of solutions $x$ that have a high value of $R(x)$. [1]
The word ‘flow’ in GFlowNet actually refers to unnormalized probabilities of an action given a state. The proposed approach views the probability assigned to an action given a state as the flow associated with a network whose nodes are states, and outgoing edges from that node are deterministic transitions driven by an action. The total flow into the network is the sum of the rewards in the terminal states (i.e., a partition function) and can be shown to be the flow at the root node (or start state). The proposed algorithm is inspired by Bellman updates and converges when the incoming and outgoing flow into and out of each state match. [4]
\ A neural net can be used to sample each of these forward-going constructive actions, one at a time. An object $x$ is done being constructed when a special "exit" action or a deterministic criterion of the state (e.g., $x$ has exactly $n$ elements) has been triggered, when we have reached a terminal state $x$, and after which we can get a reward $R(x)$.
PS: Reward can only be gained after reaching terminal state of a trajectory is the so-called episodic setting of RL.
GFlowNet is motivated by cases where diversity of the generated candidates is particularly important when the oracle is itself uncertain. An "oracle" typically refers to an idealized agent or system that has perfect knowledge of the environment and can provide optimal actions or solutions. Traditional reinforcement learning algorithms tries to generate the single highest-reward sequence of actions, while GFlowNet samples a distribution of trajectories whose probability is proportional to a given positive return or reward function. [4,5]
In the task of searching for formulaic alpha factors, not only the predictive performance of alpha factors are important, we also value the ability of exploration of the search algoithm due to the massive amount of stochasticity in stock markets. We may use IC as a metric of predictive performance but blindedly pursuing high absolue IC might cause overfitting and spurious correlation between the factors. Hence, GFlowNet is advantageous in our research problem where we want the sampled alpha factors are diversified.
To alleviate computational burden, we use historical market data of stocks in the S&P 500 index from Year 2018 to Year 2019 for training purpose in this demo. Features include:
Data sources:
You may find the pre-processing steps in notebooks/preprocess.ipynb
.
The methodology is inspired by the framework proposed by [6]. Implementation of the methodology can be found in src/
folder. An example training session and some simple analyses can be found in notebooks/train.ipynb
.
An action is defined as generating a token that could be an operator or a feature. For demonstration purpose, we only include a small set of unary and binary operators here. This is a list of all tokens used:
Category | Examples |
---|---|
Unary opeartors | abs($\cdot$), ops_log($\cdot$), ops_roll_std($\cdot$) |
Binary operator | ops_add($\cdot$, $\cdot$), ops_subtract($\cdot$, $\cdot$), ops_multiply($\cdot$, $\cdot$), ops_divide($\cdot$, $\cdot$), ops_roll_corr($\cdot$, $\cdot$) |
Features | $open, $close, $high, $low, $volume |
Detailed definition of the operators can be found in the step
function of src/alphaclass
.
A state is defined a sequene of tokens generated by actions performed by the model. Similar to [6], we adopt the reverse Polish nottaion to convert a mathematical expression of an alpha factor to a linear token sequence. For example, the expression ops_roll_std(ops_add(ops_abs($high),$volume))
is represented by the list ['BEG', '$high', 'ops_abs', '$volume', 'ops_add', 'ops_roll_std', 'SEP']
. This is implemented with an operand stack data structure.
The reward is defined as the squared information correlation (IC) between the next-day forward return and alpha values, penalized by the proportion of missingness in the alpha factor table, average across all days in the time period (i.e., 2018-2019). Specifically, given the current-day alpha value, the next-day forward return $y_t$ is defined as the gained or loss percentage in stock price between the next trading day and the next second trading day.
To reiterate, the IC between the stock trend it aims to predict $y_t$ and the factor values $f(X_t)$ is defined as the Spearman correlation of $y_t$ and $f(X_t)$. Therefore, the reward given a generated alpha factor is:
$$R(X) = \frac{1}{T}\sum{t=1}^{T} Corr{spearman}(f(X_t), y_t)^2 \times (1-\text{NaN}\%)$$
Several training objectives have already been proposed for GFlowNets, all aiming at constructing a neural net that outputs the flows and/or the transition probabilities of the forward policy such that they define a proper flow, that matches the desired flow through the terminal states $x$, i.e., $F(x)=R(x)$ [1,4,5]. We adopt the trajectory balance loss, which seems to yield more efficient credit assignment and faster convergence compared to the others [7].
The trajectory-level constraint should be satisfied by any complete trajectory starting in $s_0$ and ending in a terminal state $s_n=x$:
$$ \begin{equation}F(s0) \prod{t=1}^n P_F(st|s{t-1}) = R(sn) \prod{t=1}^n PB(s{t-1}|s_t).\end{equation} $$
The motivation for the corresponding loss (again trying to match logarithms of these products within a squared error form) is that it provides an immediate gradient to all the states visited in a complete trajectory (whether it was sampled forward from $s_0$ or backward from an $x$). The trajectory balance loss is hence formally defined as:
Both forward and backward policies share a base two-layer LSTM feature extractor with positional encoding that converts token sequences into dense vector representations. See src/models.py
for implementation details.