Closed RamParameswaran closed 4 years ago
I was able to recreate this with quite a few other names as well 😅. Would you be interested in picking this up?
Yep I'm happy to pick this up along with #22 .
Cool, I'll edit this and remove #22. Thanks!
@skekre98 what were some of the other player names you replicated this bug for? ... So I can use in test cases
These are some I remember:
Who is a better shooter Kobe or Rodman?
Dennis Rodman
Rod Freeman
Who is a better shooter Kobe or Wilt?
Wilt Chamberlain
Will Barton
Who is a better shooter Kobe or Gervin?
George Gervin
George Irvine
I'll let you know if I come across anymore 👍
Cool thanks. FYI - there are a few cases of duplicate names (e.g. Derrick Gervin vs George Gervin) where the string matching may or may not return the expected result.
Obviously that's outside the scope of string-matching - but it's a curly issue to keep in mind!
Agreed, this will require a different approach down the road. Perhaps unrelated to strings.
Describe the bug In
modules/scraper.py
theSequenceMatcher(None, name, player).ratio()
function produces unexpected results.The results are particularly unexpected when: i) the length of
name
andplayer
strings are significantly different (e.g. "Lebron" and "Lebron James") ii) there is a case sensitivity mismatch (e.g. "Lebron" and "LeBron")Recommend using a different string matching - perhaps the fuzzywuzzy package (fuzz.partial_ratio)? Also recommend refactoring this into a separate 'get_player_url' function for testing purposes.
To Reproduce Steps to reproduce the behavior:
for player in alltime_player_list:
block)Expected behavior Expected: variable
similar_name == "LeBron James"
Actual behavior Actual: variable
similar_name == "Leon Brown"
(from the 1946/47 season!)