spinlud / py-linkedin-jobs-scraper

MIT License
307 stars 84 forks source link

LinkedIn invalidates my cookie #74

Closed LaiArturs closed 5 months ago

LaiArturs commented 8 months ago

For some reason when I run query, before even I get any meaningful result, LinkedIn invalidates my cookie. In result there is some error and I am also logged out of LinkedIn in my chrome browser.

I am trying to run scraper in Jupyter notebook. Log:

INFO:li:scraper:('Starting new query', "Query(query=Software Developer options=QueryOptions(limit=1 locations=['Belgium'] filters=QueryFilters(on_site_or_remote=[<OnSiteOrRemoteFilters.REMOTE: '2'>]) apply_link=False skip_promoted_jobs=False page_offset=0))")
INFO:li:scraper:('Building search url', "Query(query=Software Developer options=QueryOptions(limit=1 locations=['Belgium'] filters=QueryFilters(on_site_or_remote=[<OnSiteOrRemoteFilters.REMOTE: '2'>]) apply_link=False skip_promoted_jobs=False page_offset=0))")
on_site_or_remote [<OnSiteOrRemoteFilters.REMOTE: '2'>]
INFO:li:scraper:('Chrome debugger url', 'http://localhost:44993/')
INFO:li:scraper:('Websocket debugger url: ', 'ws://localhost:44993/devtools/page/349DEF1A7CAAC4BA795FA31D22564E01')
INFO:li:scraper:('[Software Developer][Belgium]', 'Setting authentication cookie')
INFO:li:scraper:('[Software Developer][Belgium]', 'Opening https://www.linkedin.com/jobs/search?keywords=Software+Developer&location=Belgium&f_WT=2&start=0')
INFO:li:scraper:('[Software Developer][Belgium]', 'Session is valid')
ERROR:li:scraper:('[Software Developer][Belgium][1]', 'Timeout on loading job details')
INFO:li:scraper:('[Software Developer][Belgium][1]', 'Failed to process')
WARNING:li:scraper:('[Software Developer][Belgium][2]', 'Session is no longer valid, this may cause the scraper to fail')
ERROR:li:scraper:('[Software Developer][Belgium][2]', JavascriptException(), 'Traceback (most recent call last):\n  File "/home/arturs/Pr/my/jobs/linkedin_scraper/py-linkedin-jobs-scraper/linkedin_jobs_scraper/strategies/authenticated_strategy.py", line 382, in run\n    driver.execute_script(\n  File "/home/arturs/Pr/my/jobs/linkedin_scraper/venv/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 406, in execute_script\n    return self.execute(command, {"script": script, "args": converted_args})["value"]\n  File "/home/arturs/Pr/my/jobs/linkedin_scraper/venv/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 346, in execute\n    self.error_handler.check_response(response)\n  File "/home/arturs/Pr/my/jobs/linkedin_scraper/venv/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 229, in check_response\n    raise exception_class(message, screen, stacktrace)\nselenium.common.exceptions.JavascriptException: Message: javascript error: Can...
Traceback (most recent call last):
  File "/home/arturs/Pr/my/jobs/linkedin_scraper/py-linkedin-jobs-scraper/linkedin_jobs_scraper/strategies/authenticated_strategy.py", line 382, in run
    driver.execute_script(
  File "/home/arturs/Pr/my/jobs/linkedin_scraper/venv/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 406, in execute_script
    return self.execute(command, {"script": script, "args": converted_args})["value"]
  File "/home/arturs/Pr/my/jobs/linkedin_scraper/venv/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 346, in execute
    self.error_handler.check_response(response)
  File "/home/arturs/Pr/my/jobs/linkedin_scraper/venv/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 229, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.JavascriptException: Message: javascript error: Cannot read properties of undefined (reading 'querySelector')
  (Session info: chrome=119.0.6045.105)

I can get some results if I don't use a cookie and use anonymous strategy instead.

spinlud commented 5 months ago

See https://github.com/spinlud/py-linkedin-jobs-scraper/issues/76