jldbc / pybaseball

Pull current and historical baseball statistics using Python (Statcast, Baseball Reference, FanGraphs)
MIT License
1.19k stars 324 forks source link

Failing to match specific player names #250

Open bjb406 opened 2 years ago

bjb406 commented 2 years ago

My code cycles through everyone who played last year, and pulls statcast data. For 99.9% it works fine. There were a few quirks to code around, such as accents or punctuation in people's names that were throwing it off, but those were all fixable. However when I try to lookup the playerid for Yankees pitcher Michael King via playerid_lookup, it has no idea who he is. Using the fuzzy=True parameter, it gives me Michael Tonkin, Michael Kohn, Michael Kirkman, Michael Young, and Hal King. I can look up his information in the actual statcast webpage just fine. I'm not sure yet if there are any other anomalies like this.

wonsk99 commented 2 years ago

It looks like in the csv that there's a Mike appended before "King, Michael McRae" so if you look up Mike King, you'll get the Yankees pitcher. I'm not sure how the csv is being generated though, so I don't know where Mike is coming from and why it's saved as the first name...