j-andrews7 / kenpompy

A simple yet comprehensive web scraper for kenpom.com.
https://kenpompy.readthedocs.io/en/latest/?badge=latest
GNU General Public License v3.0
70 stars 21 forks source link

cannot login to kenpom.com using login function from kenpompy.utils #24

Closed JoshuaKirkham closed 1 year ago

JoshuaKirkham commented 1 year ago

Hi Jared,

I unable to login to kenpom.com using your suggested login method: from kenpompy.utils import login browser = login(your_email, your_password)

After pip installing kenpompy and running the above code, I am seeing the attached error. I believe the problem is that the MechanicalSoup stateful_browser.py is not finding any forms on the kenpom site. I have written in a form counter into the stateful_browser.py select_form() function, and you can see the form count results at the top of the output attached.

Do you have any suggestions to resolve this issue?

I am really excited to use your kenpompy project! I am looking to develop a predictive model for the upcoming ncaab season, and your project had the exact kenpom.com web scrapper functions that I need!

Thank you in advance! Josh

jupyter_Screenshot_2022-10-07 071556

j-andrews7 commented 1 year ago

I can replicate this. Not really sure what's going on, the form is definitely still there and all.

This will take some additional digging with beautifulsoup, I'll try to get to it in the next few weeks.

esqew commented 1 year ago

Unfortunately it appears recently KenPom has gone behind CloudFlare DDOS protection. Attempting a login with MechanicalSoup produces this response body:

<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]--><!--[if IE 7]>    <html class="no-js ie7 oldie" lang="en-US"> <![endif]--><!--[if IE 8]>    <html class="no-js ie8 oldie" lang="en-US"> <![endif]--><!--[if gt IE 8]><!--><html class="no-js" lang="en-US"> <!--<![endif]-->
<head>
<title>Attention Required! | Cloudflare</title>
<meta charset="utf-8"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="IE=Edge" http-equiv="X-UA-Compatible"/>
<meta content="noindex, nofollow" name="robots"/>
<meta content="width=device-width,initial-scale=1" name="viewport"/>
<link href="/cdn-cgi/styles/cf.errors.css" id="cf_styles-css" rel="stylesheet"/>
<!--[if lt IE 9]><link rel="stylesheet" id='cf_styles-ie-css' href="/cdn-cgi/styles/cf.errors.ie.css" /><![endif]-->
<style>body{margin:0;padding:0}</style>
<!--[if gte IE 10]><!-->
<script>
  if (!navigator.cookieEnabled) {
    window.addEventListener('DOMContentLoaded', function () {
      var cookieEl = document.getElementById('cookie-alert');
      cookieEl.style.display = 'block';
    })
  }
</script>
<!--<![endif]-->
</head>
<body>
<div id="cf-wrapper">
<div class="cf-alert cf-alert-error cf-cookie-error" data-translate="enable_cookies" id="cookie-alert">Please enable cookies.</div>
<div class="cf-error-details-wrapper" id="cf-error-details">
<div class="cf-wrapper cf-header cf-error-overview">
<h1 data-translate="block_headline">Sorry, you have been blocked</h1>
<h2 class="cf-subheadline"><span data-translate="unable_to_access">You are unable to access</span> kenpom.com</h2>
</div><!-- /.header -->
<div class="cf-section cf-highlight">
<div class="cf-wrapper">
<div class="cf-screenshot-container cf-screenshot-full">
<span class="cf-no-screenshot error"></span>
</div>
</div>
</div><!-- /.captcha-container -->
<div class="cf-section cf-wrapper">
<div class="cf-columns two">
<div class="cf-column">
<h2 data-translate="blocked_why_headline">Why have I been blocked?</h2>
<p data-translate="blocked_why_detail">This website is using a security service to protect itself from online attacks. The action you just performed triggered the security solution. There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.</p>
</div>
<div class="cf-column">
<h2 data-translate="blocked_resolve_headline">What can I do to resolve this?</h2>
<p data-translate="blocked_resolve_detail">You can email the site owner to let them know you were blocked. Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.</p>
</div>
</div>
</div><!-- /.section -->
<div class="cf-error-footer cf-wrapper w-240 lg:w-full py-10 sm:py-4 sm:px-8 mx-auto text-center sm:text-left border-solid border-0 border-t border-gray-300">
<p class="text-13">
<span class="cf-footer-item sm:block sm:mb-1">Cloudflare Ray ID: <strong class="font-semibold">7581eb2e8e0c0ced</strong></span>
<span class="cf-footer-separator sm:hidden">•</span>
<span class="cf-footer-item hidden sm:block sm:mb-1" id="cf-footer-item-ip">
      Your IP:
      <button class="cf-footer-ip-reveal-btn" id="cf-footer-ip-reveal" type="button">Click to reveal</button>
<span class="hidden" id="cf-footer-ip">[redacted]</span>
<span class="cf-footer-separator sm:hidden">•</span>
</span>
<span class="cf-footer-item sm:block sm:mb-1"><span>Performance &amp; security by</span> <a href="https://www.cloudflare.com/5xx-error-landing" id="brand_link" rel="noopener noreferrer" target="_blank">Cloudflare</a></span>
</p>
<script>(function(){function d(){var b=a.getElementById("cf-footer-item-ip"),c=a.getElementById("cf-footer-ip-reveal");b&&"classList"in b&&(b.classList.remove("hidden"),c.addEventListener("click",function(){c.classList.add("hidden");a.getElementById("cf-footer-ip").classList.remove("hidden")}))}var a=document;document.addEventListener&&a.addEventListener("DOMContentLoaded",d)})();</script>
</div><!-- /.error-footer -->
</div><!-- /#cf-error-details -->
</div><!-- /#cf-wrapper -->
<script>
  window._cf_translation = {};

</script>
</body>
</html>
j-andrews7 commented 1 year ago

Hm, I'll have to do some reading and see if there's any way around this.

On Mon, Oct 10, 2022, 3:01 PM Sean Quinn @.***> wrote:

Unfortunately it appears recently KenPom has gone behind CloudFlare DDOS protection. Attempting a login with MechanicalSoup produces this response body:

<!DOCTYPE html>

Attention Required! | Cloudflare

Sorry, you have been blocked

You are unable to access kenpom.com

Why have I been blocked?

This website is using a security service to protect itself from online attacks. The action you just performed triggered the security solution. There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.

What can I do to resolve this?

You can email the site owner to let them know you were blocked. Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.

— Reply to this email directly, view it on GitHub https://github.com/j-andrews7/kenpompy/issues/24#issuecomment-1273752199, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOAQNG5PLJRX47QIMWO2ETWCRYYJANCNFSM6AAAAAAQ7RQBWY . You are receiving this because you commented.Message ID: @.***>

esqew commented 1 year ago

Setting MechanicalSoup to use an explicit User-Agent string seems to resolve this, at least for now (#25 raised)... although if the sensitivity on this mechanism is turned up at any time more evasive maneuvers may be required.