oseymour / ScraperFC

Python package for scraping soccer data from a variety of sources
GNU General Public License v3.0
222 stars 49 forks source link

selenium.common.exceptions.InvalidArgumentException: Message: invalid argument: 'url' must be a string #17

Closed lurue101 closed 1 year ago

lurue101 commented 1 year ago

Hello Owen, thanks a lot for your work. Your package helped me a lot with a private project. So far, I have only scraped data from last season or older. I tried to scrape the current season, but get the same error and can't find a solution. This is the code I run and below is the error I get

league_file = "laliga"
league="La_Liga"
seasons = [2023]
for season in seasons:
    try:
        out = scraper.scrape_matches(season, league)
    except:
        traceback.print_exc()
    out.to_csv(
        f"path/data/{league_file}/{season}_matches_understat.csv"
    )

Traceback (most recent call last):
  File "/var/folders/p8/zzym2b694hl8cfrbs64b_rbm0000gn/T/ipykernel_4180/1075101141.py", line 8, in <cell line: 6>
    out = scraper.scrape_matches(season, league)
  File "/Users/rueck/.local/share/virtualenvs/oddset-brudi-E6_olS0B/lib/python3.10/site-packages/ScraperFC/Understat.py", line 243, in scrape_matches
    match   = self.scrape_match(link)
  File "/Users/rueck/.local/share/virtualenvs/oddset-brudi-E6_olS0B/lib/python3.10/site-packages/ScraperFC/Understat.py", line 133, in scrape_match
    self.driver.get(link)
  File "/Users/rueck/.local/share/virtualenvs/oddset-brudi-E6_olS0B/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 446, in get
    self.execute(Command.GET, {'url': url})
  File "/Users/rueck/.local/share/virtualenvs/oddset-brudi-E6_olS0B/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 434, in execute
    self.error_handler.check_response(response)
  File "/Users/rueck/.local/share/virtualenvs/oddset-brudi-E6_olS0B/lib/python3.10/site-packages/selenium/webdriver/remote/errorhandler.py", line 243, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument: 'url' must be a string
  (Session info: headless chrome=107.0.5304.87)
Stacktrace:
0   chromedriver                        0x000000010d1a82c8 chromedriver + 4752072
1   chromedriver                        0x000000010d128463 chromedriver + 4228195
2   chromedriver                        0x000000010cd8bb18 chromedriver + 441112
3   chromedriver                        0x000000010ce029fb chromedriver + 928251
4   chromedriver                        0x000000010cde9d02 chromedriver + 826626
5   chromedriver                        0x000000010ce02134 chromedriver + 926004
6   chromedriver                        0x000000010cde9b33 chromedriver + 826163
7   chromedriver                        0x000000010cdba9fd chromedriver + 633341
8   chromedriver                        0x000000010cdbc051 chromedriver + 639057
9   chromedriver                        0x000000010d17530e chromedriver + 4543246
10  chromedriver                        0x000000010d179a88 chromedriver + 4561544
11  chromedriver                        0x000000010d1816df chromedriver + 4593375
12  chromedriver                        0x000000010d17a8fa chromedriver + 4565242
13  chromedriver                        0x000000010d1502cf chromedriver + 4391631
14  chromedriver                        0x000000010d1995b8 chromedriver + 4691384
15  chromedriver                        0x000000010d199739 chromedriver + 4691769
16  chromedriver                        0x000000010d1af81e chromedriver + 4782110
17  libsystem_pthread.dylib             0x00007ff814e2f259 _pthread_start + 125
18  libsystem_pthread.dylib             0x00007ff814e2ac7b thread_start + 15

Thanks in advance best regards Lukas

lurue101 commented 1 year ago

Hello again, I found a fix. So the problem is that in the Understat.py file in the functionscrape_matches the "links" variable contains a None, which then causes the error. So I just added a check for that. I'm not sure if you would want to have a deeper look at why there is a None, but if not I'm happy to create a pull request and fix it

            if link is None:
                continue
            match   = self.scrape_match(link)
            matches = matches.append(match, ignore_index=True)
oseymour commented 1 year ago

Hey @tacticfox! Thanks for reaching out. I can add that fix, but you also need to change your league string. It can't have the underscore, it should just be "La Liga".

If you never need to see what the league strings are, go to shared_functions.py and there's a dict at the top with all of the available leagues for each source.

Let me know if it works with "La Liga".

lurue101 commented 1 year ago

Hey, using "La Liga" doesn't change anything for me. It's the same problem. And I'm pretty sure it works with the underscore, I have used that all the time

oseymour commented 1 year ago

Huh, that's 2 issues then. OK, thanks for the heads up, this is on my to-do list now!

oseymour commented 1 year ago

v2.3.0 should fix the selenium error. Let me know if it works for you!