j-andrews7 / kenpompy

A simple yet comprehensive web scraper for kenpom.com.
https://kenpompy.readthedocs.io/en/latest/?badge=latest
GNU General Public License v3.0
73 stars 21 forks source link

get_pomeroy_ratings not pulling some team names properly #9

Closed steveroks closed 3 years ago

steveroks commented 3 years ago

Hello, When using kpm.get_pomeroy_ratings to get team stats such as Rk, Conf, Record etc, it is not pulling in the teams' full name correctly in some cases. For example, San Diego St. is ranked as 33 and is on row index # 32. However, the name returned via api is simply 'San Diego' instead of 'San Diego St.'

image

image

Another example is Alabama A&M which shows up as just Alabama A

image

Thanks, Steve

j-andrews7 commented 3 years ago

Probably a regex issue. Or something changed on Ken's site that messed things up.

I have very limited time to devote to this project anymore, but I'd welcome a PR.

On Mon, Nov 1, 2021, 3:52 AM steveroks @.***> wrote:

Hello, When using kpm.get_pomeroy_ratings to get team stats such as Rk, Conf, Record etc, it is not pulling in the teams' full name correctly in some cases. For example, San Diego St. is ranked as 33 and is on row index # 32. However, the name returned via api is simply 'San Diego' instead of 'San Diego St.'

[image: image] https://user-images.githubusercontent.com/93515687/139645508-41774b3d-92dd-4e95-b568-f813ac8bbe24.png

[image: image] https://user-images.githubusercontent.com/93515687/139645583-d0c3bf40-3c2e-4088-bbcb-58ddd3759796.png

Another example is Alabama A&M which shows up as just Alabama A

[image: image] https://user-images.githubusercontent.com/93515687/139646340-07653a4a-9642-4f55-a3be-495db137988f.png

Thanks, Steve

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/j-andrews7/kenpompy/issues/9, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOAQNBDMCPA4LSUM3A3BALUJZIOLANCNFSM5HDQ4GMA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

steveroks commented 3 years ago

Thanks for responding!

I apologize for my n00b-ness but Im not sure how to submit a proper pull request...however I can tell you this

The issue looks to be with the section for parsing out the seed. It looks like the pattern is set to only look for A-Z and only two words which is why teams with 3 strings in their name (like San Diego St.) are getting truncated to just San Diego. Also since its only looking for A-Z, Alabama A&M is getting truncated at Alabama A

image

If you remove this section of code to parse out the seed, everything works as expected. The full team names are returned. I dont think this parsing is even needed at all as that page doesnt include seeds in the team names

image

j-andrews7 commented 3 years ago

For the current season, no, the seeds are not there, but I think they are for past seasons. Regardless, thanks for the digging. I am swamped, but will try to fix this if I ever get some time.

On Tue, Nov 2, 2021 at 1:44 PM steveroks @.***> wrote:

Thanks for responding!

I apologize for my n00b-ness but Im not sure how to submit a proper pull request...however I can tell you this

The issue looks to be with the section for parsing out the seed. It looks like the pattern is set to only look for A-Z and only two words which is why teams with 3 strings in their name (like San Diego St.) are getting truncated to just San Diego. Also since its only looking for A-Z, Alabama A&M is getting truncated at Alabama A

[image: image] https://user-images.githubusercontent.com/93515687/139925527-bd614e60-752f-4f83-bdff-deb4c5e0710e.png

If you remove this section of code to parse out the seed, everything works as expected. The full team names are returned. I dont think this parsing is even needed at all as that page doesnt include seeds in the team names

[image: image] https://user-images.githubusercontent.com/93515687/139925727-564b3b09-53ae-43f7-99db-e7a36c36629e.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/j-andrews7/kenpompy/issues/9#issuecomment-958038227, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOAQNGJ2YDTAT7CGCYISEDUKA5S5ANCNFSM5HDQ4GMA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

j-andrews7 commented 3 years ago

This should be fixed in the latest release.

jwall5678 commented 2 years ago

I believe I am still having the same issue as was happening before, although being on the latest release.