Open TonyCorke opened 5 years ago
The difference are due to mis-codings. This following accounts for the 7 player discrepancy that @TonyCorke highlighted:
The following are inconsistencies between the names used in the FitRoy package and AFL Tables. They don't in affect the statistics, but may cause problems if the data sitting behind the player ids is ever rescraped.
This is fabulous @afableco. Thank you.
Am I right that we're still one short of the seven we need though, as we get from the changes:
So that's +7 and -1 for a net gain of 6.
Or, have I misinterpreted your explanation?
Sadly, you are correct. I forgot to net off Archie Richardson. I will try and get back to this on the weekend to see if I can work out who else is missing.
No rush at all - and thank you for looking at the issue I raised so quickly!
The answer is Tom Darcy. In my original note, I had that Jim Darcy (ID 4318) should have been Tom Darcy, but it seems they are two separate people. Tom played for South Melbourne had his first game 1904-09-03, and Jim played for Essendon and had his first game 1897-05-08.
There are other issues with the data (eg Cam Rayner is recorded as Heber Quinton in 2018).
Perfect! Thanks again.
Below is some code that can be used to patch the data:
library(fitzRoy)
dat <- get_afltables_stats(start_date = "1897-05-01", end_date = "2019-05-21")
dat$ID[dat$ID == 4350 & dat$Playing.for == "Fitzroy" & dat$Season == 1898 & dat$Round %in% c(7,10)] = 15000 dat$First.name[dat$ID == 4350 & dat$Playing.for == "Fitzroy" & dat$Season == 1898 & dat$Round %in% c(7,10)] = "Arthur" dat$Surname[dat$ID == 4350 & dat$Playing.for == "Fitzroy" & dat$Season == 1898 & dat$Round %in% c(7,10)] = "Davidson"
dat$ID[dat$First.name == "George" & dat$Surname == "McLeod" & dat$Playing.for == "St Kilda" & dat$Season == 1903] = 15001
dat$ID[dat$First.name == "Archie" & dat$Surname == "Richardson" & dat$Playing.for == "St Kilda" & dat$Season == 1898] = 15002 dat$First.name[dat$First.name == "Archie" & dat$Surname == "Richardson" & dat$Playing.for == "St Kilda" & dat$Season == 1898] = "Mr" dat$Surname[dat$First.name == "Archie" & dat$Surname == "Richardson" & dat$Playing.for == "St Kilda" & dat$Season == 1898] = "Richardson"
dat$ID[dat$First.name == "Archie" & dat$Surname == "Richardson" & dat$Playing.for == "St Kilda" & dat$Season == 1900] = 15003 dat$First.name[dat$First.name == "Archie" & dat$Surname == "Richardson" & dat$Playing.for == "St Kilda" & dat$Season == 1900] = "William" dat$Surname[dat$First.name == "Archie" & dat$Surname == "Richardson" & dat$Playing.for == "St Kilda" & dat$Season == 1900] = "Richardson"
dat$ID[dat$First.name == "Archie" & dat$Surname == "Richardson" & dat$Playing.for == "St Kilda" & dat$Season == 1901] = 15004 dat$First.name[dat$First.name == "Archie" & dat$Surname == "Richardson" & dat$Playing.for == "St Kilda" & dat$Season == 1901] = "Alfred" dat$Surname[dat$First.name == "Archie" & dat$Surname == "Richardson" & dat$Playing.for == "St Kilda" & dat$Season == 1901] = "Richardson"
dat$ID[dat$First.name == "Jim" & dat$Surname == "Dorgan" & dat$Season == 1949] = 15005 dat$First.name[dat$First.name == "Jim" & dat$Surname == "Dorgan" & dat$Season == 1949] = "Jack" dat$Surname[dat$First.name == "Jim" & dat$Surname == "Dorgan" & dat$Season == 1949] = "Dorgan"
dat$ID[dat$First.name == "Alex" & dat$Surname == "Johnston" & dat$Playing.for == "Richmond" & dat$Season == 1908 & dat$Round == 8] = 15006 dat$First.name[dat$First.name == "Alex" & dat$Surname == "Johnston" & dat$Playing.for == "Richmond" & dat$Season == 1908 & dat$Round == 8] = "Walter" dat$Surname[dat$First.name == "Alex" & dat$Surname == "Johnston" & dat$Playing.for == "Richmond" & dat$Season == 1908 & dat$Round == 8] = "Johnston"
dat$ID[dat$First.name == "Jim" & dat$Surname == "Darcy" & dat$Playing.for == "Sydney" & dat$Season == 1904 & dat$Round == 17] = 15007 dat$First.name[dat$First.name == "Jim" & dat$Surname == "Darcy" & dat$Playing.for == "Sydney" & dat$Season == 1904 & dat$Round == 17] = "Tom" dat$Surname[dat$First.name == "Jim" & dat$Surname == "Darcy" & dat$Playing.for == "Sydney" & dat$Season == 1904 & dat$Round == 17] = "Darcy"
Thanks heaps for all this guys. I'm going to try block out some time to focus on some of these in the coming weeks.
I will need to work out which issues are to do with fitzRoy, versus which are to do with the underlying data on afltables.com. My general philosophy is to leave things as they appear on afltables.com and try get Paul who runs the website to fix it there. But some helper functions to clean the data may also be useful - will have to think about it!
Thanks for all the work so far identifying them!
Please briefly describe your problem and what output you expect.
Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.
According to AFLTables as at the end of R3 2019, there have been 12,710 debutants. There are only 12,703 unique IDs in the AFLTables extract.
Created on 2019-04-14 by the reprex package (v0.2.1)