Closed mjuchli closed 6 years ago
It has become a challenge on how to represent the mentioned features, as all of the underlying values are derived from a continuous time series and therefore will most likely not appear in the same constellation more than once during a relatively small training set. Hence, the capabilities of value iteration are exceeded. As a result, we have to find either an appropriate approximation of the feature values, or, represent states with a simple function (function approximation) of those features [1, 2]:
Volume:
Input: sum(bid.vol + ask.vol)
for previous n
states
Approximation: n*sum(bid.vol + ask.vol)
for entire training set
Output: [0.1, 0.2, ... , 1.0]
Fluctuation:
Input: bestAsk
for previous n
states
Approximation: bestAsk_s - bestAsk_s-1
Output: [-1, 0, 1]
Bids/Asks:
[1] https://danieltakeshi.github.io/2016/10/31/going-deeper-into-reinforcement-learning-understanding-q-learning-and-linear-function-approximation/
[2] http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/FA.pdf
Features are being described here: https://github.com/backender/ctc-executioner/commit/d3440b0bb81183b610811bc81e4d4e27d7ce771b
As a first step in order to extend the features to be used during the learning process, incorporate:
In a subsequent step, and in combination with #5 , the aim is to train on: