tarb / betfair_data

Fast Python Betfair historical data file parser
https://betfair-datascientists.github.io/tutorials/jsonToCsvRevisited/
MIT License
41 stars 5 forks source link

Parser Filter #8

Open liampauling opened 2 years ago

liampauling commented 2 years ago

I am not sure if this will have any benefit in terms of speed on your implementation however in flumine we have some code which ignores/doesn't output data that doesn't meet user requirements, for example inplay and seconds_to_start.

This gets passed down to bflw and has a huge speed improvement to the python code so wondering if it would be the same for betfair_data? Here is the code I have added which does the same but after it has been processed by the Rust code:

    def _read_loop(self) -> list:
        # listener_kwargs filtering
        in_play = self.listener_kwargs.get("inplay")
        seconds_to_start = self.listener_kwargs.get("seconds_to_start")
        cumulative_runner_tv = self.listener_kwargs.get("cumulative_runner_tv", False)
        if in_play is None and seconds_to_start is None:
            process_all = True
        else:
            process_all = False
        # process files
        files = betfair_data.bflw.Files(
            [self.market_filter],
            cumulative_runner_tv=cumulative_runner_tv,
            streaming_unique_id=self.stream_id,
        )
        for file in files:
            for update in file:
                if process_all:
                    yield update
                else:
                    for market_book in update:
                        if market_book.status == "OPEN":
                            if in_play:
                                if not market_book.inplay:
                                    continue
                            elif seconds_to_start:
                                _seconds_to_start = (
                                    market_book.market_definition.market_time
                                    - market_book.publish_time
                                ).total_seconds()
                                if _seconds_to_start > seconds_to_start:
                                    continue
                            if in_play is False:
                                if market_book.inplay:
                                    continue
                        yield [market_book]
tarb commented 2 years ago

The parser creates values on the fly, so the best (and i think only) way to currently speed it up is to short circuit the for update in file: loop, to put it another way, to stop parsing the file when you've got what you need.

The 1 thing that I could potentially add with this info tho, is to convert to a mutable structure when I know you dont want the parsed values, then back to immutable when you do. We know from the bfd impl, that this could provide a roughly 30% speed up for those updates (not having to allocate new memory for every changed Runner/ladder/etc). It would be a significant bit of work and avenue for error however, and a 30% speed up for a few updates might not be worth the effort.

liampauling commented 2 years ago

I thought you would say that, how about the option to choose mutability for bflw and then a copy made only if required for example:

if filter:
    yield [market_book.copy()]