theOehrly / Fast-F1

FastF1 is a python package for accessing and analyzing Formula 1 results, schedules, timing data and telemetry
https://docs.fastf1.dev
MIT License
2.48k stars 259 forks source link

[ENH] Include underlying compound information #332

Open Casper-Guo opened 1 year ago

Casper-Guo commented 1 year ago

Proposed new feature or change:

Opening this issue because a quick search doesn't turn up any prior discussion.

Per documentation for Laps object Compound column:

the actual underlying compounds C1 to C5 are not differentiated

This information is critical for comparing tyre performance across races. Although it may be redundant/repetitive at the Laps level, it would still be good to have it as a part of either Session or Event.

My own investigation has not turned up a good way to fetch this information automatically.

theOehrly commented 1 year ago

One of the main reasons why I haven't worked on this is the lack of a good source. As you say, there does not seem to be a good way to fetch that information automatically. The only reliable way I have found so far is to keep an eye on Pirelli Motorsports Twitter feed before each race. I need to check if there aren't any FIA documents about this. But it looks like this would be a manual process before each race weekend. Or one could do automatic image processing of the Twitter feed. But as interesting as that task would be, it's quite a waste of time to develop that feature.

Casper-Guo commented 1 year ago

The Pirelli Motorsports Twitter will definitely be a hard source to use. I doubt their tweets will be formatted in a set way each week and it would be challenging to automatically extract information from their graphics.

I have somehow never thought of using FIA documents for this. I took a quick look and the compounds are included in the event notes for each race under the Pirelli preview. Example

These are already uploaded to Twitter automatically sample Tweet. Since these are well-formatted PDFs I am guessing there are tools available to parse the information. I will look into this more when I get around to it.

theOehrly commented 1 year ago

I have never tried to automatically parse a PDF so I can't judge whether it's easier to extract data from a table in there versus a table in an image. I am still wondering, whether it's actually worth automating this. Or if it's better to just have a very simple manual process. The effort for automating this completely is relatively high. On the other hand I don't really want to be responsible for doing this task before every race weekend, even if it is simple.

harningle commented 1 year ago

It's fairly easy to parse FIA documents and get the tyre compounds, but not sure if and how should we add this into the package. Manually searching for the compounds seems much easier to me. Or should we build our own API for this?

Minimal example (I don't think it works for all races): https://github.com/harningle/Fast-F1/blob/dev-compound/fastf1/compound/docparser.py

theOehrly commented 1 year ago

It's fairly easy to parse FIA documents and get the tyre compounds, but not sure if and how should we add this into the package. Manually searching for the compounds seems much easier to me. Or should we build our own API for this?

Minimal example (I don't think it works for all races): https://github.com/harningle/Fast-F1/blob/master/fastf1/compound/docparser.py

I want to add something like this. The way that I'd prefer is to automate the parsing of the documents and build an own API server that serves the data on a public API. I might get around to that later this summer.

I don't really want to do processing like this in FastF1 itself, because it is inefficient, likely to cause rate limit issues and it's more difficult to fix incorrectly parsed results.

harningle commented 1 year ago

It's fairly easy to parse FIA documents and get the tyre compounds, but not sure if and how should we add this into the package. Manually searching for the compounds seems much easier to me. Or should we build our own API for this? Minimal example (I don't think it works for all races): https://github.com/harningle/Fast-F1/blob/master/fastf1/compound/docparser.py

I want to add something like this. The way that I'd prefer is to automate the parsing of the documents and build an own API server that serves the data on a public API. I might get around to that later this summer.

I don't really want to do processing like this in FastF1 itself, because it is inefficient, likely to cause rate limit issues and it's more difficult to fix incorrectly parsed results.

Yes make sense! I've never built an API server but would like to learn and get involved. Let me know if there is anything I can help! Btw, Oracle has pretty good lifetime free virtual instances and we may be able to build the API there.

harningle commented 1 year ago

I've parsed the event notes and pirelli preview for all races since 2019, to get the tyre compounds: https://github.com/harningle/fia-doc/blob/main/tyres.json. The race name in the json can be matched with EventName here. Happy to revise the format or set an API for this!

theOehrly commented 1 year ago

Nice, especially that it's possible to parse all those races with the same script. That's very promising.

I'm planning to set up an API server anyway but this is still very much a work in progress. But it's needed for multiple reasons and I think the point has come where it's actually reasonable to do.

The current list of things to do to make this work is the following:

The API server is the most blocking part here right now. But I'm kind of working on it. It's new territory for me, though. Therefore, it's going a bit slowly and I need to play around with some stuff and see how it works.

Casper-Guo commented 1 year ago

Thoughts on how to bring 2018 compounds into the fold? I have the data in TOML but I am more thinking about how to make the code/API more general to anticipate for future compound changes

harningle commented 1 year ago

Thoughts on how to bring 2018 compounds into the fold? I have the data in TOML but I am more thinking about how to make the code/API more general to anticipate for future compound changes

FIA has all documents in PDF format in their archive, all the way back to 2012. However, the formats are different: my current script can handle 2018 with simple revision, while I can't find compound info in any document for year 2014. For historical data maybe it's easier to do it manually once. If we want to go even more back in time, next year FIA will have a public E-library, which will host all documents since the first days.

No idea for the potential changes in the future though.

theOehrly commented 1 year ago

So, after a bit of thinking, I came up with something like this for the required data structure, if we want to support all past and future events.

{
  'season': <int>,
  // the season year

  'events': [
  // an array of events
    {
      'round': <int>,
      // the round number

      'eventKey': <int, optional>,
      // the eventKey that is used in the F1 livetiming API, only exits for current events

      'compoundSpecifications': {
      // an object that lists all possible compounds for this event
      // each compound gets an id that it can be referred to by
        compoundId <any>: {
          'manufacturer': <str>,
          // name of the tyre manufacturer

          'compoundName': <str>,
          // array of compound names, e.g. C1, C2, ...

          'simpleName': <str, optional>
          // "simple" names i.e. HARD, MEDIUM, ...
          // this name may change per event
        }
      },

      'compounds': {
      // this object maps an array of compound ids to each constructor
      // (preferably use ergast constructor ids)
        constructorId <int>: [
          {
            'compoundId': <compoundId>,

            'availableCount': <int>
            // number of available tyre set of this compound
            // could be interesting if we manage to get the data
          },
          ...
        ]
      }
    },
  ]
}

This should make it possible to handle all reasonable cases that I can think of right now, like

Instead of using compound ids, we could also just directly include the compound information in the mapping for each team. That would make for lots of redundant data (which probably doesn't matter too much) but less complexity.

Annoyingly, this looks pretty complicated. Much more complicated then I'd like it to be. But supporting various compounds, manufacturers and compound sets and names that change per event is complicated I guess. If anybody can come up with something simpler, please suggest it.

theOehrly commented 1 year ago

@harningle this tyre data stuff might get somewhat delayed, considering that #445 has popped up. I want to integrate it into the potential Ergast successor, to not have many different systems. That whole project will hopefully get more traction starting in October. In case you are interested in helping out there, I really could need some help from people who know about relational databases and how to build an API server. Even if it's just a bit of consulting.

Casper-Guo commented 1 year ago

Comment to say I would like to be in the loop as well. Has done some DB and SQL but building API server will be a good learning opportunity

harningle commented 1 year ago

I'm in! I work frequently with SQL and API in my job, but have never built anything. Happy to learn and contribute.

marcll commented 4 months ago

Hello @theOehrly, this might be already an old topic, but I just published something that might become useful.

I have created a generalist parser for FIA documents that works with LLMs and text summarizers and that is able to extract tyre compound information, as well penalties and decisions for given races by running the race documents through an Large language model.

Wanted to share the repo (https://github.com/marcll/f1-fia-doc-parser) in case that it might be useful and that the problem and use case is still relevant.

Thanks for your amazing work!