[ENH] Add complete support for the Ergast API

theOehrly commented 1 year ago

Proposed new feature or change:

FastF1 should implement complete support for the Ergast API (https://ergast.com/mrd/).

Requirements:

implement all API endpoints
utilize FastF1's caching system for all requests
expose the modular API interface through an intuitive object-oriented interface
handle response length/paging and offsets for getting further response pages
return the raw response (json-like python object)
support the representation of the data as pandas.DataFrame

Intentions:

replace the current set of limited functions to interface with Ergast with a more unified solution
full support as a new major feature

theOehrly commented 1 year ago

@oscr it's probably best if we continue the previous discussion from #243 here, so it is easier to find. Any discussion to directly compare different ideas should take place here.

Also, I said that I wanted to finish my proposal draft by yesterday. But I didn't manage to get that done. Specifically, responses like the ones returned when requesting qualifying results for a whole season didn't really work with my idea. The problem is that the response basically contains too many dimensions to be directly representable within a dataframe. It also doesn't work well with the more simple parsing/representation that you proposed. We'd end up with so many layers of nested dictionaries and lists within a the dataframe that we could just leave out the dataframe. I need to think about this some more.

theOehrly commented 1 year ago

I've come up with a new idea for how to potentially handle more complicated API responses like the qualifying results for a full season. @oscr what's your opinion on this? You've had some good ideas already, and you've spent some time looking at this as well already.

grafik

Implementation Idea

Two types of result objects:

ErgastRawResult o The Response Header is abstracted away, as the user doesn’t need it. It is used internally to allow fetching more pages. o The Request Body is provided without modification as a JSON-like object
ErgastResult o The Response Header is abstracted away, as the user doesn’t need it. It is used internally to allow fetching more pages. o Query filters are moved to a separate property of the object o The Result Element Descriptions are used to build a DataFrame that provides an overview over the result data set. All data is kept but flattened. Data types are converted to int, float, datetime. o For each individual Query Result a DataFrame is built from the Result Element Data. All data is kept but flattened. Data types are converted to int, float, datetime. o The individual rows of the overview DataFrame are linked to the individual results and vice versa. That way, one could further select specific result elements using the data inside the overview DataFrame. (In terms of implementation, this is the most difficult part.)

Something like

class ErgastResult:

    @property
    def filters(self):
         return {'season': 2022, ....}

    @property
    def overview(self):
         return DataFrame(....)

    @property
    def results(self):
        return [DataFrame(...), ....]

Less complex reponses like the driver standings would then have no description and only a single DataFrame in the result property.

An alternative would be to split those complex responses up into single responses as if one would have called the API for each result individually. Maybe this would be more intuitive for a user? Then, there would only be a single flatten result DataFrame and maybe a Series with further information.

oscr commented 1 year ago

You've really done some excellent investigative work here. To me this makes sense, seems doable and I think this would be useful for based on my knowledge level of Pandas. How would this and the alternative look for a more Pandas native usage? I've mostly used traditional loops etc to do calculations given my limited Pandas experience. But what would "correct" usage look like?

theOehrly commented 1 year ago

I'm not sure that I fully understand your question. None of the options that I suggested above can be implemented fully "pandas-native". There either needs to be some kind of main response object. Or one main DataFrame that links the individual responses together, but that would not really be "normal" pandas-like stuff. In my opinion, the data is just too multi-dimensional to be able to fit into a DataFrame. Therefore, some compromises need to be made. I want to expose as much data as possible through pandas, because it is such a powerful tool. But on the other hand, one must not overdue that or else things get more complicated than really necessary.

oscr commented 1 year ago

Thank you for the clarification! It think this seems promising. I've been thinking about making a more refined "who can still win WDC" script and I can pretty much see how I this would implementation would work with it. So I think this will be a useful and exciting addition to FF1. Great work!

theOehrly commented 1 year ago

Implemented in #311

theOehrly / Fast-F1

[ENH] Add complete support for the Ergast API #277

Proposed new feature or change: