tarb / betfair_data

Fast Python Betfair historical data file parser
https://betfair-datascientists.github.io/tutorials/jsonToCsvRevisited/
MIT License
38 stars 5 forks source link

Only a few market objects per event ID are returned given price data files (tar or bz2) #12

Open ozonosphere opened 1 year ago

ozonosphere commented 1 year ago

when using files = bfd.Files(paths) where paths is a list of .tar file paths. There are only a few market objects per event ID returned and most are missing.

I also tried using

with open(path, "rb") as file:
        ff = bfd.File(path, file.read())
for market in ff:
      market....

The above also return the same thing, I think it is something to do with the bc2 price file given by Betfair has '\n' and is not widely recognize as valid JSON, hence causing the reading to miss most lines of data?

tarb commented 1 year ago

Heya, if you run with logging on do you get any output? It could be discarding malformed files. But happy to investigate it

import logging

logging.basicConfig(level=logging.WARN, format='%(levelname)s %(name)s %(message)s')
ozonosphere commented 1 year ago

Hi, thanks for getting back.

the logging message is WARNING betfair_data file: xxx\xxx\xxxx\xxxx\xxx.tar\xxx\xxxx.bz2 err: (JSON Parse Error) unknown field "batb", expected one of "id", "atb", "spn", "spf", "spb", "spl", "trd", "tv", "ltp", "hc" at line 1 columns 7 There is a lot of them

The data is about 10,360 bz2 files compressed in a tar archive, I am expecting multiple and complete market or market_book objects per bz2 file (event id), but functions seem to only return a few.

It could be something to do with generator and file opening/closing process, as I tried a similar structure using betfairlightweight it returned the same result, but when I remove the generator that iterates over files, betfairlightweight does give me a complete set of market_book objects (often more than hundreds per bz2 file)

tarb commented 1 year ago

what happens if you try my bflw compat mode, does that return the same data as betfairlightweight

from betfair_data import bflw

for file in bflw.Files(paths):
    for market_books in file:
        for market_book in market_books:
           ....

Alternatively, if you can upload a file that doesn't work/parse, Id be happy to take a deeper look into it

ozonosphere commented 1 year ago

just tried it, and still only two market_books test_data.zip this is sample data, specifically for file 1.191376781.bz2, there should be hundreds of market or market_books objects returned and most of them would have last_price_traded, at the moment there is only two market_book objects returned

tarb commented 1 year ago

Just had a look and both those files use batb/batl instead of atb/atl - currently this only support atbl/atl as thats the format that the offical sold betfair data comes in, and the data that most people record in(it has the full ladder).

Im not against adding support for the other ladder format though if theres demand - it shouldn't be that hard to add and I think some people might need it for recording/replaying virtual prices.

ozonosphere commented 1 year ago

I don't know, these are data files purchased and directly downloaded from Betfair historic data section, also I am only interested in last_price_traded for the runners, the problem is the functions cannot get the complete market_book objects when the event is inplay, is this really relevant to batb/batl vs atbl/atl?

tarb commented 1 year ago

Yea unfortunately thats the cause. I didnt know that Betfair has historic data files with batb in them now so thats new! :) I reckon for now stick with betfairlightweight and then if you still want to in the future, you can move across to my bflw compat mode with basically no code changes.

I'll make an issue to add batb/batl support and work on getting that added.

ozonosphere commented 1 year ago

ok, thanks very much for the help