theOehrly / Fast-F1

FastF1 is a python package for accessing and analyzing Formula 1 results, schedules, timing data and telemetry
https://docs.fastf1.dev
MIT License
2.39k stars 253 forks source link

[ENH] Add complete support for the Ergast API #277

Closed theOehrly closed 1 year ago

theOehrly commented 1 year ago

Proposed new feature or change:

FastF1 should implement complete support for the Ergast API (https://ergast.com/mrd/).

Requirements:

Intentions:

theOehrly commented 1 year ago

@oscr it's probably best if we continue the previous discussion from #243 here, so it is easier to find. Any discussion to directly compare different ideas should take place here.

Also, I said that I wanted to finish my proposal draft by yesterday. But I didn't manage to get that done. Specifically, responses like the ones returned when requesting qualifying results for a whole season didn't really work with my idea. The problem is that the response basically contains too many dimensions to be directly representable within a dataframe. It also doesn't work well with the more simple parsing/representation that you proposed. We'd end up with so many layers of nested dictionaries and lists within a the dataframe that we could just leave out the dataframe. I need to think about this some more.

theOehrly commented 1 year ago

I've come up with a new idea for how to potentially handle more complicated API responses like the qualifying results for a full season. @oscr what's your opinion on this? You've had some good ideas already, and you've spent some time looking at this as well already.

grafik

Implementation Idea

Two types of result objects:

Something like

class ErgastResult:

    @property
    def filters(self):
         return {'season': 2022, ....}

    @property
    def overview(self):
         return DataFrame(....)

    @property
    def results(self):
        return [DataFrame(...), ....]

Less complex reponses like the driver standings would then have no description and only a single DataFrame in the result property.

An alternative would be to split those complex responses up into single responses as if one would have called the API for each result individually. Maybe this would be more intuitive for a user? Then, there would only be a single flatten result DataFrame and maybe a Series with further information.

oscr commented 1 year ago

You've really done some excellent investigative work here. To me this makes sense, seems doable and I think this would be useful for based on my knowledge level of Pandas. How would this and the alternative look for a more Pandas native usage? I've mostly used traditional loops etc to do calculations given my limited Pandas experience. But what would "correct" usage look like?

theOehrly commented 1 year ago

I'm not sure that I fully understand your question. None of the options that I suggested above can be implemented fully "pandas-native". There either needs to be some kind of main response object. Or one main DataFrame that links the individual responses together, but that would not really be "normal" pandas-like stuff. In my opinion, the data is just too multi-dimensional to be able to fit into a DataFrame. Therefore, some compromises need to be made. I want to expose as much data as possible through pandas, because it is such a powerful tool. But on the other hand, one must not overdue that or else things get more complicated than really necessary.

oscr commented 1 year ago

Thank you for the clarification! It think this seems promising. I've been thinking about making a more refined "who can still win WDC" script and I can pretty much see how I this would implementation would work with it. So I think this will be a useful and exciting addition to FF1. Great work!

theOehrly commented 1 year ago

Implemented in #311