zcakhaa / DeepLOB-Deep-Convolutional-Neural-Networks-for-Limit-Order-Books

This jupyter notebook is used to demonstrate our recent work, "DeepLOB: Deep Convolutional Neural Networks for Limit Order Books", published in IEEE Transactions on Singal Processing. We use FI-2010 dataset and present how model architecture is constructed here. The FI-2010 is publicly avilable and interested readers can check out their paper.
371 stars 204 forks source link

❗ Target leakage in your method #17

Open andrewsonin opened 2 years ago

andrewsonin commented 2 years ago

Problem

I was able to successfully transfer the model you proposed to the USD/RUB currency pair, which is traded on the Moscow Exchange, and classify the further behavior of the quotes within next 10 minutes. However, I was constantly failing to create a profitable trading strategy based on your model using the smoothed target you suggested:

It seemed strange to me, since the following two factors support the assumption that at least some kind of strategy can still be built:

  1. High ROC-AUC score (80%) on a well balanced binary classification.
  2. Beautiful and meaningful coloring of the price plot. Indeed, where there was an upward trend, the target turned out to be much greater than zero (green zones), and where there was a downward trend, it turned out to be much less (red zones):

Then I realized that the target you proposed contains a rather obvious information leak, which can be described by the following formula:

Moreover:

And this is what makes your target useless from the trading perspective, since at time t you cannot make deals at price m-minus(t).

To prove this I trained sklearn LogisticRegression using only ONE feature, which is the difference between the current price and m-minus term in the proposed target. ROC-AUC of such a simple model increased to 81%.

Then I refitted your model on the "noisy" alternative target, which is meaningful from the trading perspective (unlike the previous one):

ROC-AUC dropped to a value slightly greater than 50%, and this was not enough to beat even the bid-ask spread.

Conclusion

My experiment shows that the success you reached using the above target is not due to any "smoothing" as you say and the lack of noise, but only due to the bad target design. You simply made your target conditionally dependent on the trivially extracted feature information. Moreover, it is absolutely useless from the trading perspective, since at time t we cannot make deals at price m-minus(t), but only at something near to p(t).

ericyue commented 2 years ago

I have concern about this too, from my experiment result, without using m- to label, the final accuracy drop a lot .

aHaidiChen commented 2 years ago

I also found this issue; the high accuracy is due to the label.

deeepwin commented 2 years ago

Could you point me to the place in the Jupyter notebook where you see the issue with the label? I do not see the issue in the code.

The Juypter notebook uses FI-2010 where labels have been already created according to equation 8 in https://arxiv.org/pdf/1705.03233.pdf

Just wanna make sure to not make a mistake. Thanks.

stefan-jansen commented 2 years ago

See discussion in bottom half of left column on page 4.

deeepwin commented 2 years ago

I think there is no issue with neither equation 3 nor 4 to use as targets. The accuracy drops to 50%, because it is less noisier, hence classification works better. You can calculate m- in a real trading system keeping the values in a cache. That should work.

Would you explain in more detail why this target engineering makes it useless for a trading system and where you see the information leakage (sorry don't quite understand the formulas). Thanks.

stefan-jansen commented 2 years ago

@deeepwin I don't see the leakage argument either, p_t enters m- and not m+ according to (2). m- is the average of p_t and k-1 earlier prices. Moreover, I'm not sure why, when the difference between p_t and the average of p_t itself and those earlier prices approaches infinity, future prices would be even higher with certainty.

However, the smooth target that uses m- makes it hard to trade on since you can't buy the last k-1 ticks when you get the buy signal at t, unless I'm missing something. Unfortunately, as the authors state, the experiments with p_t alone (which is the mid-point, not even the more adverse ask) appear to take us back to a coin flip.

deeepwin commented 2 years ago

@stefan-jansen thanks for the feedback.

the smooth target that uses m- makes it hard to trade on since you can't buy the last k-1 ticks when you get the buy signal at t, unless I'm missing something.

Yes, I also see it like that. I guess that is what the other meant. But that does not matter in my opinion. Because m- represents a state of the system, if that state triggers a buy signal through the model, it would expect m+ state in the future. This is regardless of current p(t) value. That value will never really match anyways (slippage). If you validate the model on truly unseen data (out of sample) one would assume based on the papers results higher returns than a flip coin. I guess that would need to be carefully tested.

LeonardoBerti00 commented 1 year ago

Hello to everyone, I know that is passed 1 year and a half, but if possible I want to reopen the issue. Sincerely, I don't see any information leak, I understand the formulas but I don't understand what you have demonstrated with them. They consider m+ and m- to have a smoother labeling method and to try to catch up and down trends. In no way is the will to trade at past values expected.

MickyDowns commented 4 months ago

While @andrewsonin may have overstated the leakage (h/t @stefan-jansen), his conclusion that the 2020 version of DeepLob was worthless may be true from a LOB trader's perspective. Of course DeepLob was not designed to be a trader's tool. I suspect it was designed to show that layered networks are rendering hand-crafted features obsolete in even the most demanding environments, and to point the way for further research.

When I test DeepLob using LOB data streamed from IBrokers I find: 1. that it can be a source of signal for algorithms with a 2- to 8-second horizon, 2. the results are inline with @LeonardoBerti00 et al's findings in table 2 here. Namely, 1. While performance drops on new data, its f1 is ~0.58 (as is its balanced accuracy, but that will depend on your selections of alpha and k), 2. DeepLOB is the best all performer in the 2017-2023 cohort.