Closed quantitative-technologies closed 1 year ago
There isn't. If you provide an example file for me to look into its format, I would add an example converter.
By the way, without a local timestamp indicating when you received the feed, accurate backtesting is not possible, as there is no feed latency information. While you can artificially generate a local timestamp by assuming feed latency, it is preferable to collect the data yourself for more reliable results.
There isn't. If you provide an example file for me to look into its format, I would add an example converter.
Here is example LOB data for a single day: https://drive.google.com/file/d/1rVaDblmYJL0aPpgvdJ-fU9QFhMDga6f_/view?usp=sharing
Btw, I was also happy to write it, but wanted to make sure I wasn't "reinventing the wheel".
Yes, good point about the local timestamp. Thanks for the tip.
The artificial local timestamps are fine for my purposes at the moment.
trade data is also required. still it's possible to backtest only based on depth data. it's meaningless especially in high freq. backtesting.
Right. I was not suggesting trying to use OB data alone. Actually, I found your repo while looking for an implementation for inventory models, which of course need trade data to fit them.
The trade data is available from the Binance Public Data:
wget https://data.binance.vision/data/futures/um/daily/trades/BTCUSDT/BTCUSDT-trades-2020-07-01.zip
Here is the trade data corresponding to the above depth data.
I added the converter. hftbacktest/data/utils/binancehistmktdata.py (a5d3f91)
could you check if it works as expected? again, in my experience, I have observed that backtest results can exhibit significant discrepancies unless precise feed latency and order latency are used.
Excellent!
My plan was to look into the inventory MM model (as you gave an example of). I will report it if anything unexpected shows up.
I think you mean significant discrepancies between backtest and live trading results, but I am not doing any live trading at the moment. If you want to me to try out one of your other examples with the binance historical data, please let me know.
I am getting an error using the following trade data, for ETHUSDT on 2022-10-03, as in your example notebook.
I think it is because the first row contains the column names, unlike the previous example. My guess is that the format has changed with newer data.
Thanks for the report. Please see the latest commit. 740feee413795ea2a196077926e5def9e123229b
Thanks for updating the code.
Now I can successfully run the data preparation notebook.
However, when I use the prepared data from binance in the Guéant–Lehalle–Fernandez-Tapia Market Making Model and Grid Trading notebook, it is off by a factor of about 2 in trading intensity from your calculated results. For example:
It's as if there are only half as many trades in the data files obtained from binance. To be safe, I added a 10ms feed latency, but as expected that does not affect the fitted model parameters.
Note that I had to adjust for the fact that the binance data is timestamped to milliseconds rather than microseconds.
Would it be possible to share your collected data for ETHUSDT futures on 2022-10-03 (e.g. on Google drive)? That way people could reproduce your results, and also I could directly compare the trade data to binance.
For your information, I used trade
stream instead of aggTrade
stream which is currently officially documented but aggregated.
I'm not sure I understand, since I also used trade
data from binance, rather than aggTrade
. In fact, your converter does not even work on the binance historical aggTrade
data, though I don't see a need for it.
Unless you are suggesting that the trade
data from binance is in fact still aggregated?
Anyhow, my plan is to collect my own data from the stream and then I can compare with the historical data from binance.
No. But trade
stream functions as expected, just like its description in the official spot API document, even though it is not outlined in the official futures API documents. So I guess Binance's historical data also came from aggTrade
. Comparison is the most effective way for figuring things out.
Another issue showed up: I was working with more recent data, and it has an additional undocumented field trans_id
. This changes the offset of the other fields, and breaks the converter.
Here is an example of the recent snapshot data: https://drive.google.com/file/d/1y-9nt9V-eB_OV3uSq4-dzBe-eOsQDt4S/view?usp=sharing
See 2b3137c3c643e9a96621e8fb0c3cd46ab0922dde and let me know if it works as expected.
Code looks much better now without hard-coded indices, and it processes the snapshot fine.
But now it fails on the convert
function call in the validation step with an exception.
Here is the lob data and trade data to reproduce this.
See 7299d9a3968c7acc079dfffc5aad50c947e86cf2. I fixed the mingled timestamp issue but since the data hasn't local timestamp, there is no way but sorting. That can cause another discrepancy. Beware of that.
Thanks! I tested it out and there were no more errors.
I'm not sure exactly what discrepancy you mean, but perhaps it will become more clear as I continue working on it.
What I meant by that is that any difference from the live trading environment can cause a discrepancy.
Thanks for updating the code.
Now I can successfully run the data preparation notebook.
However, when I use the prepared data from binance in the Guéant–Lehalle–Fernandez-Tapia Market Making Model and Grid Trading notebook, it is off by a factor of about 2 in trading intensity from your calculated results. For example:
It's as if there are only half as many trades in the data files obtained from binance. To be safe, I added a 10ms feed latency, but as expected that does not affect the fitted model parameters.
Note that I had to adjust for the fact that the binance data is timestamped to milliseconds rather than microseconds.
Would it be possible to share your collected data for ETHUSDT futures on 2022-10-03 (e.g. on Google drive)? That way people could reproduce your results, and also I could directly compare the trade data to binance.
Could you provide the code of Guéant–Lehalle–Fernandez-Tapia Market Making Model? :)
you can find it on tutorials page or examples directory.
you can find it on tutorials page or examples directory.
thanks
I wanted to take advantage of the freely available historical futures orderbook level 2 data from binance.
It should be possible by combining this with historical trade data (also available from binance I believe) to obtain normalized data for
htfbacktest
.But I couldn't find this in the repo examples. I wanted to check if it has already been done, so I don't waste time redoing it?