[ENH] Telemetry DataFrame should include DriverId

theOehrly / Fast-F1

FastF1 is a python package for accessing and analyzing Formula 1 results, schedules, timing data and telemetry

https://docs.fastf1.dev

MIT License

2.46k stars 259 forks source link

[ENH] Telemetry DataFrame should include DriverId #484

Closed pbulsink closed 9 months ago

pbulsink commented 10 months ago

Proposed new feature or change:

Telemetry DataFrame should include the DriverId. Rationale:

It's possible to get a telemetry without specifying a driver (for example, by calling session.laps.pick_fastest().get_telemetry()) in which case extra work is required to see who that telemetry belongs to.
This would simplify handling multiple telemetries without having to manually add the DriverId to each telemetry DataFrame manually, and be a first step toward allowing for get_telemetry() to be run on Laps objects containing more than one DriverId.

theOehrly commented 10 months ago

What you are proposing is to add a column (or two, probably) containing driver abbreviation and/or driver number, right? Not just a property on the telemetry object that returns those values?

This would [...] be a first step toward allowing for get_telemetry() to be run on Laps objects containing more than one DriverId

What purpose do you see for this functionality?

pbulsink commented 10 months ago

Yeah adding the column(s) would be preferred.

The addition of pick_drivers() opened up a quick way of doing comparison work between multiple drivers, having the ability to quickly and succinctly pull out (for example) the telemetry for VER and LEC in lap one of a GP as session.laps.pick_laps([1]).pick_drivers(['VER', 'LEC']).get_telemetry() seems like a logical next ability of the package.

theOehrly commented 10 months ago

Having mixed telemetry DataFrames with data from multiple drivers was never intended functionality. The main problem is that many of the custom methods of the Telemetry object, for example .add distance, .resample_channels, ..., cannot deal with this kind of mixed data at the moment. They would either need to be rewritten to perform the operations individually for each driver in the given DataFrame or they would need to raise an error. So this is not as simple as just adding these columns. It involves some more work.

I'm not necessarily against doing this. But I don't see a real benefit at the moment. What exactly is the use case that you see? Why would that improve functionality or user experience?

pbulsink commented 10 months ago

Thanks for taking the time to expand on the various functions of Telemetry - particularly explaining the challenges of them working on mixed Telemetry frames.

I'm not particularly attached to this suggestion, so feel free to call it too much work and close without action. It's a lot of work for what is essentially just saving the user the hassle of binding multiple get_telemetry calls into one DataFrame, or doing other simple data processing steps.

theOehrly commented 10 months ago

Can you maybe give a concrete example where this would be a noticeable improvement? The issue here might just as well be that I have difficulty understanding what a user might need. I never actually "use" FastF1 because whenever I'd have time for that, I do development work instead.

If this is instead something that would make things easier for the R wrapper, that would be something we can discuss as well.

pbulsink commented 9 months ago

I think we had some discussion on this for the R wrapper, there were some specific tasks that would have been way easier (for us) with this enabled in Python.

I think too though that merging data frames of telemetries is likely easier than re-writing the whole Telemetry object's method code.

@Scasanova, can you share the use case for multiple driver telemetries extraction at once? This comes from working on https://github.com/SCasanova/f1dataR/issues/78

SCasanova commented 9 months ago

Hey! Yeah of course

The idea behind this is for when one is building a model that uses mass data from a race. Mainly simplifying the extraction process of telem for more than one driver without having to use a for loop on our end

theOehrly commented 9 months ago

The API provides this data as one time series for each driver. This data is currently not merged in FastF1. So the question becomes, where do we loop over the data to merge it. (To be fair, I could maybe do it without an explicit loop in Python, but I'm not sure if that would even be more efficient. And the loop would then probably just be inside Pandas or Numpy code.) You can already get data for all laps of a single driver with one function call. That means, you need around 20 iterations and merge that data. This won't be much different if I do it internally in FastF1. This is fairly simple to do in Python. I don't know anything about how the R wrapper interfaces with FastF1. If this is more difficult or inefficient to do in the wrapper, then I'm happy to figure out some way to make this easier for you.

pbulsink commented 9 months ago

I think this is going to be about equally difficult in Python as it is/was in R. Given the need to refactor methods should multiple drivers' telemetries be passed simultaneously, I think we can handle it with loops in R (or let the end user do so if they wish).

I'm fine to close this as "won't implement".