jldbc / pybaseball

Pull current and historical baseball statistics using Python (Statcast, Baseball Reference, FanGraphs)
MIT License
1.26k stars 333 forks source link

The attached players are not returned via `playerid_lookup(last_name)` #262

Open dgrochmal opened 2 years ago

dgrochmal commented 2 years ago

Robinson Cano Edwin diaz Yennsy Diaz Eloy Jimenez Jhan Marinez Brailyn Marquez Adalberto Mondesi Raul Mondesi Manuel Rodriguez Fernando Tatis Fernando Tatis

dgrochmal commented 2 years ago

Appears to me that the cause is that special characters aren't being handled as they should (é,í,ó,ñ, etc)

tjburch commented 2 years ago

Appears to me that the cause is that special characters aren't being handled as they should (é,í,ó,ñ, etc)

That's correct. If you type the í explicitly:

  name_last name_first  key_mlbam key_retro  key_bbref  key_fangraphs  mlb_played_first  mlb_played_last
0   mondesí  adalberto     609275  mondr003  mondera02          13769            2016.0           2022.0
1   mondesí       raul     119247  mondr002  mondera01           1314            1993.0           2005.0

I recommend using the fuzzy argument for this.

>>> playerid_lookup("Raul Mondesi", fuzzy=True)
No identically matched names found! Returning the 5 most similar names.
   name_last name_first  key_mlbam key_retro  key_bbref  key_fangraphs mlb_played_first mlb_played_last
0    mondesí       raul     119247  mondr002  mondera01           1314           1993.0          2005.0
1   saunders        joe     434578  saunj001  saundjo01           4366           2005.0          2014.0
2   saunders       tony     121711  saunt001  saundto01        1011463           1997.0          1999.0
3      ramos       john     120911  ramoj001  ramosjo01        1010679           1991.0          1991.0
4  edmondson       paul     113746  edmop101  edmonpa01        1003680           1969.0          1969.0

Then you can use .iloc[0] for Raul.

dgrochmal commented 2 years ago

That could work for my use-case. Though "Mondesi" would previously find "Raul Mondesí", and no longer does.

In any case, I feel like it would be useful to be able to search for players like this without searching with the special characters.