oseymour / ScraperFC

Python package for scraping soccer data from a variety of sources
GNU General Public License v3.0
241 stars 52 forks source link

Timeout error when scraping Capology #46

Open c10vis opened 2 months ago

c10vis commented 2 months ago

ScraperFC version: 3.1.0 Selenium version: 4.23.1

As normal, I import ScraperFC, initialize the Capology scraper per the documentation, and attempt to scrape EPL data from 2023-24. The result is a timeout error (see photo below). I have tried various seasons and leagues with similar results. I have been able to scrape from other modules in ScraperFC with no problems. I have also made a Capology account and logged in in-browser; this has not changed my results.

Screenshot 2024-08-16 at 15 24 15

oseymour commented 2 months ago

Hey @c10vis I just tried this locally and it worked. No error. Not sure what's going on with yours. Are you still getting the error?

c10vis commented 2 months ago

Yeah, still getting the error. I wonder if it has to do with Chrome? I'm not sure how the backend works but seems like the scraper is using a chrome driver that's causing an issue. I don't know if that's something specific to how I'm set up or just generally how it works.

oseymour commented 2 months ago

I doubt it's a chrome issue. You're correct, Selenium creates a chromedriver (essentially just a chrome window) and using that avoids a lot of anti-scraping measures vs. just doing an HTTP request with requests. Using a chromedriver also allows for interacting with the webpage (e.g., changing currency on Capology).

I can't do much to debug this without having the issue myself. You could try increasing the timeout duration.

What OS are you using? Is this running on your laptop or a remote machine/server?

c10vis commented 2 months ago

I'm running MacOS 14.6.1 on my personal laptop (M1 MacBook Pro).

I timed the issue and it runs for about 1 min 20s before the timeout error hits. How would I go about increasing the timeout duration?

oseymour commented 1 month ago

Sorry for the delay. Was moving in with my girlfriend.

What you measured is the time to hit that error (I assume). The timeout duration for finding the element that is failing is 10 seconds. You'll need to go to where python downloads packages when it pip installs them and increase the 10 to something else in the .py file. I don't know where that is on macOS though. Google should be able to tell you. And just follow the error trace you get to see which line needs to be edited.