tarb / betfair_data

Fast Python Betfair historical data file parser
https://betfair-datascientists.github.io/tutorials/jsonToCsvRevisited/
MIT License
38 stars 5 forks source link

Skip bad lines #13

Open mzaja opened 1 year ago

mzaja commented 1 year ago

I tried using this module on manually collected data using betfairlightweight. The problem is that my files contain market book data as the first line of the file (which is necessary to obtain selection's names). I thought that the parser will just skip over the unrecognized line and perhaps warn about it, but instead it crashes and burns.

Would it be possible to update it parses so that it skips over unrecognized lines instead of tapping out? If that behaviour is desired, how about adding a skip_bad_lines option to give end users a choice? I would imagine I am not the only person in this situation.

tarb commented 1 year ago

Sorry about the slow reply I've been a bit busy. I agree that this sounds like a good idea and something I should implement - in the mean time tho you should be able to bypass this by reading in the file as bytes, searching and slicing it at the first new line, and then passing the remainder into bfd.File(path, byte_slice_minus_first_line). Happy to offer more help if you need

mzaja commented 1 year ago

Hey, no problem :). Yes, that is exactly what I did. Since I also capture both at and bdat data, I used the same adapter to filter out bdat* lines which are not supported.

I also noticed that the parser bombs out on "initialClk", "conflateMs", "heartbeatMs" and "ct" fields, which are sent when a subscription is made, together with the initial image (https://docs.developer.betfair.com/display/1smk3cen4v3lu3yomq5qye0ni/Exchange+Stream+API#ExchangeStreamAPI-Subscription/SubscriptionMessage). They may also be sent on re-subscription, I do not know. In any case, you may want to prevent the parser erroring on those fields as well, since manually captured stream data will likely contain those fields on the first line (unless manually removed).