j-andrews7 / kenpompy

A simple yet comprehensive web scraper for kenpom.com.
https://kenpompy.readthedocs.io/en/latest/?badge=latest
GNU General Public License v3.0
70 stars 21 forks source link

test_get_program_ratings failing (shape test) #29

Closed esqew closed 1 year ago

esqew commented 1 year ago
FAILED tests/test_misc.py::test_get_program_ratings - assert (358, 17) == (357, 17)

Not sure why this is failing now of all times, as the '21-'22 data is still there... probably can just go ahead and change the constant in the test file (manual visual inspection of the KenPom front page indeed shows that there are 358 list items), but I'd personally prefer slightly to figure out which combination of teams were added/removed.

Looking ahead for when this season's data begins to be tracked, Wikipedia mentions there should be 5 total new D-1 teams this season and 1 demotion from D-I to D-III (Hartford), so the number may actually flux as these changes get worked into KenPom's data and are published.

We may also consider pegging this test to a specific year with a known number of teams, but I'll defer that decision to you, @j-andrews7.

j-andrews7 commented 1 year ago

Pegging the test is the right move.

And I should get CI set up with Github Actions too, probably.

esqew commented 1 year ago

Sorry, I think I jumped the gun on this one without fully understanding what was being tested... 🫢

The test itself is for the shape of the DF for https://kenpom.com/programs.php, not the main page (as I'd assumed). That page is indeed 357 rows long per visual inspection so further investigation is required.

esqew commented 1 year ago

Oddly enough, the discrepancy between the visual inspection and the number returned seems to be caused by the fact that Hartford and Western Illinois share the same rank in the all-time program rankings, and the next entry doesn't skip 299 as you might expect.

There are indeed 358 teams reflected in this list, even if there are only 357 ranks reflected.

As far as pegging this to a specific year, it doesn't look to me like the list in question has that specific option - we will have to keep an eye on how this list fluctuates over time with D-I add/drops and come up with a strategy to test. I'll submit a PR for the updated test constant to ensure this test passes in the interim.

j-andrews7 commented 1 year ago

Do you have any interest in being added as a collaborator to this repo? I really don't have the time to chase down bugs and fix the edge cases each season like I'd hope.

It won't let you push new versions to pypi, but a Github Actions workflow to do so when tagged commits are made is possible: https://packaging.python.org/en/latest/guides/publishing-package-distribution-releases-using-github-actions-ci-cd-workflows/

esqew commented 1 year ago

Do you have any interest in being added as a collaborator to this repo?

I'd love to help nudge things along as a Collaborator as I'm able.

esqew commented 1 year ago

we will have to keep an eye on how this list fluctuates over time with D-I add/drops and come up with a strategy to test

For what it's worth, KenPom has pushed 2023 data to the main page at some point within the past week and no additional tests are failing. 🎉