Closed seankim658 closed 3 weeks ago
I just ran the tests and got 3 failures out of the 24 test cases. Let me look into that quick and I'll make a new commit to correct whatever went wrong.
Ok passes all test cases now.
Generally looks pretty good to me, I'll kick the wheels a bit further when I get a chance. My one worry is that it (unsurprisingly) does away with some of the Cloudflare interception handling to let the user know what's going wrong.
...Of course all of that does nobody any good if there's no way past them anyway, which seems like it might currently be the case. And we can always add some of that back in the future.
@esqew may have some additional thoughts.
I realized I didn't include this info in the issue but the Cloudflare JS challenge I was running into in #93 wasn't being caught by the interception handling. The login function was failing here:
Traceback (most recent call last):
File "/home/seank/projects/personal/kenpom-upstream/kenpompy/main.py", line 8, in <module>
scraper = login(username, password)
File "/home/seank/projects/personal/kenpom-upstream/kenpompy/kenpompy/utils.py", line 36, in login
browser.select_form('form[action="handlers/login_handler.php"]')
File "/home/seank/.local/lib/python3.10/site-packages/mechanicalsoup/stateful_browser.py", line 241, in select_form
raise LinkNotFoundError()
mechanicalsoup.utils.LinkNotFoundError
So the Cloudflare JS detection HTML was being returned in front of the kenpom home page and then the select_form
call was failing.
The HTML returned indicated the challenge-error-text
to enable Javascript:
<!DOCTYPE html>
<html lang="en-US">
<head>
<title>Just a moment...</title>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<meta content="IE=Edge" http-equiv="X-UA-Compatible" />
<meta content="noindex,nofollow" name="robots" />
<meta content="width=device-width,initial-scale=1" name="viewport" />
<style>
* {
box-sizing: border-box;
margin: 0;
padding: 0;
}
html {
line-height: 1.15;
-webkit-text-size-adjust: 100%;
color: #313131;
font-family: system-ui, -apple-system, BlinkMacSystemFont, Segoe UI, Roboto, Helvetica Neue, Arial, Noto Sans, sans-serif, Apple Color Emoji, Segoe UI Emoji, Segoe UI Symbol, Noto Color Emoji;
}
body {
display: flex;
flex-direction: column;
height: 100vh;
min-height: 100vh;
}
.main-content {
margin: 8rem auto;
max-width: 60rem;
padding-left: 1.5rem;
}
@media (width <= 720px) {
.main-content {
margin-top: 4rem;
}
}
.h2 {
font-size: 1.5rem;
font-weight: 500;
line-height: 2.25rem;
}
@media (width <= 720px) {
.h2 {
font-size: 1.25rem;
line-height: 1.5rem;
}
}
#challenge-error-text {
background-image: url();
background-repeat: no-repeat;
background-size: contain;
padding-left: 34px;
}
@media (prefers-color-scheme: dark) {
body {
background-color: #222;
color: #d9d9d9;
}
}
</style>
<meta content="390" http-equiv="refresh" />
</head>
<body class="no-js">
<div class="main-wrapper" role="main">
<div class="main-content">
<noscript>
<div class="h2"><span id="challenge-error-text">Enable JavaScript and cookies to continue</span></div>
</noscript>
</div>
</div>
<script>
(function () {
window._cf_chl_opt = {
cvId: "3",
cZone: "kenpom.com",
cType: "managed",
cRay: "8d841b98afcc0650",
cH: "T_1E.n3BlTUPaWeR77C4iA5VLxJW_GzGiNYj8PkU2RE-1729879243-1.2.1.1-Eh9hYE8REn2iGpR7aKdPJOAjd9tkzHyN8ZvqPi2XJM1iO9vzKWs6tHY9pEdP.0Gk",
cUPMDTk: "\/index.php?__cf_chl_tk=BUZVra4deaH_JcZ4B0YBDzMgl82Cy1pgw0rcIGkKlFI-1729879243-1.0.1.1-7DuWua2qk_4BRNqhajYekpxfhY3qrdz9GTCphNZww3E",
cFPWv: "b",
cITimeS: "1729879243",
cTTimeMs: "1000",
cMTimeMs: "390000",
cTplV: 5,
cTplB: "cf",
cK: "",
fa: "\/index.php?__cf_chl_f_tk=BUZVra4deaH_JcZ4B0YBDzMgl82Cy1pgw0rcIGkKlFI-1729879243-1.0.1.1-7DuWua2qk_4BRNqhajYekpxfhY3qrdz9GTCphNZww3E",
md:
"bWf7BwCOJJsjIpVmb3yoY5GNweGi17ilxtGZxhT8zis-1729879243-1.2.1.1-bzn3vH.0VTcwhxQEHn.3hbBNmFhQ6d.9eeG2YooSc4vQWBRULQQY2iqWLZJGH7bLNjcPLSmbX8oM0gyROGCvui26J.CeivFroIA9xLxzwHUqcslRCKmHhsVAD_UiZpQDctv5va0JKh8h6JfkcTMyk7c9u4P5bapI78_8u1DdnGN6ZpiLnqg7oti8ORdHPRLoat63Fc77KlSHJrP4eVYL3wtk8sVxYA983fk9hsOM5f1_hg.z4TkuyZ2bWE9esM24ouV_IV.V7k7vxGfX0BsY8SFVBfpvFyzkDCuae3j9mbwzDG0ZQlN.h3HqoE3h56Ud.XYS8xlwML3lVQ_sDKof2YeekrEOTAeWakBtDCDm5JaYhZi9RlHfealM5uKqgxfs2.oV3DIbaBN6L6IMlrzm46v9XdL1bkqtTftK_lv4HLp7ONyBxMKNm36BBVQc_KgHMQqbsZYL27uBWbqOU4a7Ozi.TtakdFpb02aImifPkrxrzHruE23B5thSnOEWcb2fQauW8EQvX1_CTBwipW6KfsoOxhMVWp.1dYhs922ohFYK_FLgKVXl6csZ5M.kEP4xNt0bFZ5kgmeNkFWVqJuUbqk0WKDFk3UymA4xYcJGxag34Wmx7Jz2ZMLZEVwtgJ26mjSWZBBGnMCO6bpiuAOSPXlF4UumStfdZBQ3fyd4PxtLNSctGQPgAF6otQQJkG_7UJ0RM3bvbn.czTx_KMHMox2uYHX4H0Ue_9a3OKjjznEAHmn0BA91UkqQg5SNYS2752Yed0_TkeEBlAk4.zhePOtK999ZlH8QSJcDWJDOtVdsLaB6eT_q.DpDION1wb1yD4bthm_At.iWWS.Ij8o549AlhZ1xVG4v6mqnCCDe.WwDnG69N7RQXD1s1m2ApSEqvFmFJ27U_6YEk.opnOd_3H_YjLbNCHXoOhnrL4Foh7.XCPHnGxqRoysXb7ABS95qL94h.w7DbJsKlKj1PT2f1FqaRGEf1.RM9YYjnJoZEYE98ufvpQqW4ncAKrvo2QKY1S_EH1af30pIXRZbtC6NRuAjqNP7TMM.L_QOYDkZ56JVqygZFUS_9AV9pIZp6mMcUcHLwlalbKwPSvo77YJO8w3s3GGZ05VMcBEwea1Mc_eYIUey57MyZNoskjs84i7xwKqBdnOn.uk3bcEORssrMiLLu8z2qO2R98Sq9E20VdBM52mDKete_Ve93tH7E4FQ048C1MogrY6FckQTFwZi0yc0VumLU_FD_sCuMgDyRglTjckQ3oEroenDhzz3rr9C.Io3PGZwHAhpG75v1YYL0NOoPuWQn86w71UaNQ_1kCQpgz4nlHOacwQR5oZuypv1eXMOmJyJjHL12Z.X7RGKeaBHsnGGd4bZpThoW47cmefJM4_BAggs4gZnP3KQTjH.hhMo9PlN4iEwNgzTkcAUVZ9Ho_DccAjHf1kmhbFqtqodOcw5de5Me1N2m8JLIOdnjyP__HE5RQ3mKdO0Wnl0W08Kz.cKGFQebXyJ_h2Dt4jQ41ETjFPzW5Q0c2BmM7NSxSaWNFL5FA01xR07X3zs5zLYyarvrPPATl0rMfXtuJLBqu3NNTmvqZjAp5GdZTGPOTQvLHckMrePh9fyJAccIrCQJkS1yEVgzaVRC9E0T6wDdGHCk8ioRKbh_0owOkMiPAqH29qt5afV2xLY7naUJAJPYZjtTckEmli1NB2pdkjM5mLyHorlnRRHOn7PY_1n3_O2zzpuGByi.v0GqsZIcr6lYfV8mw3nWQQa5tPtJP5asDpj_O.I7pT5osuPU9o2eU_wos1wq10okYEVMt4CW8I0hoZxnGlFiMKdVYS1_OThKTiRagRZ3Fippm7pBgZQMN2xg_uTxnuWzeRcmxm0NFESZ0Hp.sERjPet9_zWbsZOUN_NA8Vn_p8YzLix9lAzlIJjneFYckwsz_KaANqbSfXgHW7nncNjCrgFsIkD8HyY6Fnxa74E7GJIzz5R8LVXaZqOzozci_x1QA4C3pX9ckwDpoRaFAp_ZtObR0t5CfYR6PR.LM9WpKNgwtjQb2z1KVHWnUejCIeN0glZFdAGHmlahzTsJrxooy4fBZ2Xz106ObyePUkPrOckky5t3Rwe9Osx6ESBHSCNdjxW.gf9pcQSlTyJyioVOi82tBRSNesuXgY3qXOlTaTZtsMhBYoNIl7hwuUG9Vp39XL1AMaYRA",
mdrd:
"thZe5pKCvBkMZq9Ow_naSE..nky8ZGAFvygi9IO4Yvo-1729879243-1.2.1.1-hPCirwZKUhXnaWj5lH43K8h5sU4w5RBMjt5h9igoG2Qst_q6l8K4b2RqLa7v8GypM9N0S.FRZA1a5vXdRBuZJ2yaVGWA7jhDnbA.9.lacBe0_qGoNQYk1aSjOTmaTsHZrgcuDaPfBWstsCeRfaH_wRDoSDcCv00N4WwOSWTbJU4tMoOGYkel1wk._ax4rGtQv5quvH5cg0UPOhE7A7NFMaBjTE9FS6iTlTr9U9wrONpF7JhBbXrT4Yr940eqZRL.sqqticT4yizKONpULzODe4RAlWvo70yyMlGnyNblBHTgt6312OAzw8g671PUVBDmdAAUS1ystjor9KeH7liUqYSfwO8RcPzszjYJZLQfTCA57Nj.1XVFExqw.fBnomeCcg6KFX97oxYe1R2tc8qDaJYA3WKeGWCoT3V_qmc4JDfOeRdeP6hJYhZIJSVeT8t._tcBmYQPbnY0N4to4mc3G4_T7qK.qu6DyogYfViFT9evSAyk57._2vEEvi0wsP.ECCJrtpiNspuOyxS4jepzjUTY.HgTBI842mN_jX_06Oauag_8AylYFh63jypUbritD9gwKvnPmWtfjSLIlZIAiOKlA_mMZeF5imdL.GepxB7FCmA7XgKkTaEWfwZN3qCg6exb9rTFp4ey2Ia1LDNUC2wCHpXsk3lLIO9ZJqYG7VA0GTKVH0glBVdcp8GoyAfr8qSeqk0B86HP9sPCsiqcRteBxYYnopUrAiR54XdvSi5pyji4mfaOem.YuftpKi1AW3O93p7J_IOC.YFLJ5DdIJ4dZJskyfZNiT2hwjtJl7ujO1h71lZGR8jKXNJFuripZDPE198kYQNhpy5CMrNRbB6TuRBYujAlBVjDYrpBwT4fSeomC3Cfneywhh0aVWtx6EsHbFLpAm0UxiuWOHBAEAUFghUU9.JYNUPWKdORqBUrU5AP_hccl1shEnGSFXg.RqOvtX5yoZu828Pn3YKQxVU2QBm0LUrdwu4sYKDlodoT65moJeRQ2ynqST8aGrpgAaun9lHvkxJ85Cg5a.aic3J3JZLqv3K3QpkuQ9dxmAsj8PDJBEotayzTi44txPJOAUOyqIV_1vNmXTVSVHisQuY8hZB4ip4jNBJvWTz7_5NEXQTjmpWy6omTkBxBCoBWVLABYKXmoD21dKEI55pF7atRF3CuV7kDEG9PCyxQbhdgLRlSM3fbMtNlCXo0jcNTEDyP9NzAz0fjjmQVuwzp.b5b55W3ZUOS6W6ZIQ4zHypoMRDe6BGGS_XfeUCpfq70iwSEbk_xmtOjQkKt5mgUwVRDzOjLSzDl4gG2rVrQEWdaRybwhK8SoVOFbVJ00bkgPpxGI56lqNnxqqgkpcbf0W_kZRQghYBIPvgAT9HW9AuFOIkVe03gYCrE3gv.LzCuV8tyOrU3h4MPf5pRmIIP_wFyoTvDoNSWuoL5Vz.3HHRJLw6sSlXYJnuWUYgmzy7JzVhY9MvCILIpU61n.VIYMIj5M2sO1aJ0J5ZxdURUcCzfWQWQdeC.AjMUKFk_ahPC7griNrZg6WT5L3bk7I7xYEpJrqMh35xXUJhFjA3eoOlBDBmNhv8WUInOuOLSprpaj4ywyUIdBe5prmm2MbZOt.YXtBc2TZpAVdDa_VGzZ817CkENLU9qaxACZ0e9HxeaZTg_m6dW7UuXeqmUG5r4mqT_wgvEGMoaw2DNMXgumt5BjCgFPjWGfBC8GueqnfwyYked6yfG7OFwVl3dJduV3yEv894rL_UKxZRVSBQrx4ciwH1YGWLn8YfsM99MDx0u8tWTbKeIqa1sSY2x4Zjda4MPPJJnfbkOkcKsBByY6dduLQe_KituEJsidxMV.SJMmwz9QLVBIIHi2NbIb.7hj6qY.g0TEEOtUb0dA0EsTLocchmIQnr4zFx4VGFsNZ3TjYy.WLB14qqnK_mpO6pVnh4v7wZYqhY3T.ERm3jtXimja.EEtf_f0Ba3.7zQywdV6BaZF50OK0QrtZ_DfEQiaGAWqG401IXm_42eo5.BfWCxz2nhAjqgsyWOnh29jLAi7U2ubcstveSVRBDlCac6f5A1_H4pTVH8p7eckui3iEVbFyvWcs3I8teq9QCKD1Cs5P2wMCJ0u4EWILAVnUCxC8MZKpDZAhZ9kZMpKhLRpbs62XB9iFEzGcZknuS0p33iupXcjtueui5fs9qmskzjyg",
};
var cpo = document.createElement("script");
cpo.src = "/cdn-cgi/challenge-platform/h/b/orchestrate/chl_page/v1?ray=8d841b98afcc0650";
window._cf_chl_opt.cOgUHash = location.hash === "" && location.href.indexOf("#") !== -1 ? "#" : location.hash;
window._cf_chl_opt.cOgUQuery = location.search === "" && location.href.slice(0, location.href.length - window._cf_chl_opt.cOgUHash.length).indexOf("?") !== -1 ? "?" : location.search;
if (window.history && window.history.replaceState) {
var ogU = location.pathname + window._cf_chl_opt.cOgUQuery + window._cf_chl_opt.cOgUHash;
history.replaceState(null, null, "\/index.php?__cf_chl_rt_tk=BUZVra4deaH_JcZ4B0YBDzMgl82Cy1pgw0rcIGkKlFI-1729879243-1.0.1.1-7DuWua2qk_4BRNqhajYekpxfhY3qrdz9GTCphNZww3E" + window._cf_chl_opt.cOgUHash);
cpo.onload = function () {
history.replaceState(null, null, ogU);
};
}
document.getElementsByTagName("head")[0].appendChild(cpo);
})();
</script>
</body>
</html>
Thanks for this PR! I'd like to take a bit of a deeper dive on the issue itself and this PR itself before approving, but preliminarily I don’t have an issue with this.
Went ahead and rebased this to merge into a new v0.4.0 branch since its a bit more of a fundamental change. If we get this rolled in, fix #92, #94, and #90, I'll be pretty happy with it and push a new version to pypi.
Once this is rolled into the v0.4.0, happy to open a PR for #92. I can also take care of #90.
In building the sphinx docs locally, everything worked fine so not sure without some more info on what is going wrong there to debug.
Just tried re-running the test cases locally and they passed. One thing I remember is that over the Summer I noticed that when running inside any type of non-local environment I would get blocked. I know for sure trying to run the login function from inside a docker container gets blocked so wondering if this is a similar issue. I can look into it this week and at the very least, add some interception handling for a more descriptive error message.
I believe the issue in this case is that the repo secrets for actions are blocked as this is being run from a fork, and Github blocks the secrets being exposed or used for security reasons.
Regardless, I am going to merge this for now so we can deal with the other issues.
Refactors to use the cloudscraper library instead of mechanicalsoup. Fixes issue #93.