Improvement to string matching methodology for player names/Implement get_player_url function

RamParameswaran commented 3 years ago

Describe the bug In modules/scraper.py the SequenceMatcher(None, name, player).ratio() function produces unexpected results.

The results are particularly unexpected when: i) the length of name and player strings are significantly different (e.g. "Lebron" and "Lebron James") ii) there is a case sensitivity mismatch (e.g. "Lebron" and "LeBron")

Recommend using a different string matching - perhaps the fuzzywuzzy package (fuzz.partial_ratio)? Also recommend refactoring this into a separate 'get_player_url' function for testing purposes.

To Reproduce Steps to reproduce the behavior:

Place breakpoint at line 227 (ie. just after the for player in alltime_player_list: block)
Run app
On /chat page use test question: "Who is a better shooter Kobe or Lebron?"

Expected behavior Expected: variable similar_name == "LeBron James"

Actual behavior Actual: variable similar_name == "Leon Brown" (from the 1946/47 season!)

skekre98 commented 3 years ago

I was able to recreate this with quite a few other names as well 😅. Would you be interested in picking this up?

RamParameswaran commented 3 years ago

Yep I'm happy to pick this up along with #22 .

skekre98 commented 3 years ago

Cool, I'll edit this and remove #22. Thanks!

RamParameswaran commented 3 years ago

@skekre98 what were some of the other player names you replicated this bug for? ... So I can use in test cases

skekre98 commented 3 years ago

These are some I remember:

Who is a better shooter Kobe or Rodman?

Expected: Dennis Rodman
Actual: Rod Freeman

Who is a better shooter Kobe or Wilt?

Expected: Wilt Chamberlain
Actual: Will Barton

Who is a better shooter Kobe or Gervin?

Expected: George Gervin
Actual: George Irvine

I'll let you know if I come across anymore 👍

RamParameswaran commented 3 years ago

Cool thanks. FYI - there are a few cases of duplicate names (e.g. Derrick Gervin vs George Gervin) where the string matching may or may not return the expected result.

Obviously that's outside the scope of string-matching - but it's a curly issue to keep in mind!

skekre98 commented 3 years ago

Agreed, this will require a different approach down the road. Perhaps unrelated to strings.

skekre98 / NBA-Search

Improvement to string matching methodology for player names/Implement get_player_url function #20