Closed john-b-edwards closed 2 years ago
For anyone that picks this up, I believe drive_pts is coming from the new_drive_pts column here: https://github.com/sportsdataverse/cfbfastR/blob/4c02a4a4dbc73bf9cf4a29493a234a6fc93593f4/R/cfbd_pbp_data.R#L1407-L1445
You'd have to grab the result of the PAT play via a lead or via the play_text and adjust those values accordingly.
Looks like drive_pts is actually coming from here based on the drives endpoint form cfbd: https://github.com/sportsdataverse/cfbfastR/blob/4c02a4a4dbc73bf9cf4a29493a234a6fc93593f4/R/cfbd_pbp_data.R#L2041-L2059
The drive endpoint also includes the starting and ending scores for both offense and defense which could (and probably should?) be used to generate the drive points instead. I'm just thinking about what makes sense when both teams score points (like the defensive conversion) and I think your suggestion of -5, -4, 4, and 5 makes sense as a column showing the change in score differential
cfbfastR::cfbd_pbp_data()
reports erroneous values for the number of points scored on a drive for some plays. For example:I have a dataframe of all plays loaded in my local environment and the only possible values for
drive_pts
appear to be -7, -2, 0, 3, or 7 -- impossible given 1) missed extra points resulting in -6 and 6, 2) successful and missed two-point conversions resulting in -8, -6, 6, or 8, and 3) defensive 2 point and 1 point conversions, resulting in -5, -4, 4, and 5. Some of these possibilities are too rare to have occurred in the dataset, but others -- like 8 -- should be present in the dataframe but are not.