j-andrews7 / kenpompy

A simple yet comprehensive web scraper for kenpom.com.
https://kenpompy.readthedocs.io/en/latest/?badge=latest
GNU General Public License v3.0
70 stars 21 forks source link

Team names are incorrect when grabbing kenpom rankings #65

Closed jgpayne closed 10 months ago

jgpayne commented 10 months ago

when using kenpompy.misc.get_pomeroy_ratings names get cut off. For example there are three Cal St. when one should be Cal St. Fullerton, Cal St. Northridge etc.

j-andrews7 commented 10 months ago

All those juco California teams are the same. Thanks for the report. Regex woes strike again. We'll try to fix this for our start of season release.

j-andrews7 commented 10 months ago

See #9 and #41 for previous issues with this. #42 was our last fix for this, I believe.

jgpayne commented 10 months ago

No worries! There is a workaround using get valid team names, but just wanted to bring it to your attention!

esqew commented 10 months ago

I believe #42 did fix this since I can't repro this when using the latest commit:

from kenpompy.utils import login
browser = login('[redacted]', '[redacted]')

from kenpompy.misc import get_pomeroy_ratings
df = get_pomeroy_ratings(browser=browser)
df[df['Team'].str.startswith('Cal')]['Team']

Result:

145             California
173      Cal St. Fullerton
180            Cal Baptist
277    Cal St. Bakersfield
321     Cal St. Northridge
340               Cal Poly
Name: Team, dtype: object

If you're still using the version that's on PyPi (released 12-2022), that doesn't yet have this patch (01-2023). If this is something that you need fixed ASAP, you should install the latest from the master branch:

pip install git+git://github.com/j-andrews7/kenpompy@master

This may require you to uninstall the library entirely before reinstalling it from master as the version number hasn't yet been bumped in preparation for the next release.

pstonebu commented 10 months ago

Not specific to the California teams, but the team name grabbing is still not working exactly right. For example, both "South Carolina" and "South Carolina State" become "South Carolina." Can use conf to differentiate, but it's still a bit confusing. Alabama A&M gets stored as "Alabama A," Arkansas Pine Bluff is "Arkansas Pine," etc.

j-andrews7 commented 10 months ago

Did you install from GitHub? If not, the version on pypi will still have these issues that should be fixed in the dev version. Once we get outstanding PRs rolled in, we'll push a new release to pypi.

On Mon, Nov 13, 2023, 11:44 AM Patrick Stoneburner @.***> wrote:

Not specific to the California teams, but the team name grabbing is still not working exactly right. For example, both "South Carolina" and "South Carolina State" become "South Carolina." Can use conf to differentiate, but it's still a bit confusing. Alabama A&M gets stored as "Alabama A," Arkansas Pine Bluff is "Arkansas Pine," etc.

— Reply to this email directly, view it on GitHub https://github.com/j-andrews7/kenpompy/issues/65#issuecomment-1808665586, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOAQNG45Y3PT5MSX5Y7TF3YEJL65AVCNFSM6AAAAAA6RK45KCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBYGY3DKNJYGY . You are receiving this because you commented.Message ID: @.***>

pstonebu commented 10 months ago

Thought I had updated kenpompy from git but evidently not. An uninstall and reinstall from git fixed it. Thanks!