jldbc / pybaseball

Pull current and historical baseball statistics using Python (Statcast, Baseball Reference, FanGraphs)
MIT License
1.28k stars 334 forks source link

Add misspelling suggestions to playerid_lookup #161

Closed tjburch closed 4 years ago

tjburch commented 4 years ago

Add spell checking on playerid_lookup. If the passed name to playerid_lookup does not have any matches, the Levenshtein distance is calculated for both the first and last names to a list of all player names, then those distances are summed. If the sum has a single minima, it raises a ValueError and suggests the single minima. If it has two, it will raise the ValueError and suggest both. If more than two, it just raises a ValueError with no suggestion.

Note, previously playerid_lookup did not raise any errors if values were not found, it would only return an empty DF, so the raised errors will be new.

Based on PR #160 - if #160 doesn't go in, this just needs to come up with a different methodology for getting a list of player names.

Example output

One minima:

>>> playerid_lookup("turner","jstin")
Gathering player lookup table. This may take a moment.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/tburch/Documents/github/pybaseball/pybaseball/playerid_lookup.py", line 103, in playerid_lookup
    raise ValueError(f"Player not found! Did you mean {suggested_first} {suggested_last}?")
ValueError: Player not found! Did you mean Justin Turner?

Two minima:

>>> playerid_lookup("Carpenter","B")
Gathering player lookup table. This may take a moment.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/tburch/Documents/github/pybaseball/pybaseball/playerid_lookup.py", line 108, in playerid_lookup
    raise ValueError(f"Player not found! Perhaps you meant {suggested_first[0]} {suggested_last[0]} or {suggested_first[1]} {suggested_last[1]}?")
ValueError: Player not found! Perhaps you meant Matt Carpenter or Ryan Carpenter?