skekre98 / NBA-Search

flask application designed to explore NBA statistics :basketball:
78 stars 74 forks source link

Improvement to string matching methodology for player names/Implement get_player_url function #20

Closed RamParameswaran closed 3 years ago

RamParameswaran commented 3 years ago

Describe the bug In modules/scraper.py the SequenceMatcher(None, name, player).ratio() function produces unexpected results.

The results are particularly unexpected when: i) the length of name and player strings are significantly different (e.g. "Lebron" and "Lebron James") ii) there is a case sensitivity mismatch (e.g. "Lebron" and "LeBron")

Recommend using a different string matching - perhaps the fuzzywuzzy package (fuzz.partial_ratio)? Also recommend refactoring this into a separate 'get_player_url' function for testing purposes.

To Reproduce Steps to reproduce the behavior:

  1. Place breakpoint at line 227 (ie. just after the for player in alltime_player_list: block)
  2. Run app
  3. On /chat page use test question: "Who is a better shooter Kobe or Lebron?"

Expected behavior Expected: variable similar_name == "LeBron James"

Actual behavior Actual: variable similar_name == "Leon Brown" (from the 1946/47 season!)

skekre98 commented 3 years ago

I was able to recreate this with quite a few other names as well 😅. Would you be interested in picking this up?

RamParameswaran commented 3 years ago

Yep I'm happy to pick this up along with #22 .

skekre98 commented 3 years ago

Cool, I'll edit this and remove #22. Thanks!

RamParameswaran commented 3 years ago

@skekre98 what were some of the other player names you replicated this bug for? ... So I can use in test cases

skekre98 commented 3 years ago

These are some I remember:

Who is a better shooter Kobe or Rodman?

Who is a better shooter Kobe or Wilt?

Who is a better shooter Kobe or Gervin?

I'll let you know if I come across anymore 👍

RamParameswaran commented 3 years ago

Cool thanks. FYI - there are a few cases of duplicate names (e.g. Derrick Gervin vs George Gervin) where the string matching may or may not return the expected result.

Obviously that's outside the scope of string-matching - but it's a curly issue to keep in mind!

skekre98 commented 3 years ago

Agreed, this will require a different approach down the road. Perhaps unrelated to strings.