haasr / indeed-jobs-searcher

Simple CLI-based tool for extracting and storing relevant job info. from bulk job searches on indeed.com
MIT License
7 stars 2 forks source link

Selenium fails with inability to look for the XPath element. #3

Closed alimovlex closed 3 weeks ago

alimovlex commented 3 weeks ago

Hello! Here is the script crash log that I got after running the request with --save false parameter

> Commencing single search...
Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id="text-input-what"]"}
  (Session info: chrome-headless-shell=126.0.6478.126); For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors#no-such-element-exception
Stacktrace:
#0 0x55850fc73c7a <unknown>
#1 0x55850f956e2c <unknown>
#2 0x55850f9a3661 <unknown>
#3 0x55850f9a3751 <unknown>
#4 0x55850f9e7f64 <unknown>
#5 0x55850f9c65ed <unknown>
#6 0x55850f9e5303 <unknown>
#7 0x55850f9c6363 <unknown>
#8 0x55850f996247 <unknown>
#9 0x55850f996b9e <unknown>
#10 0x55850fc3a24b <unknown>
#11 0x55850fc3e2f1 <unknown>
#12 0x55850fc25afe <unknown>
#13 0x55850fc3ee52 <unknown>
#14 0x55850fc0a79f <unknown>
#15 0x55850fc63638 <unknown>
#16 0x55850fc63810 <unknown>
#17 0x55850fc72dac <unknown>
#18 0x7f78d2120962 start_thread

URL searched:
https://nl.indeed.com
Search results:
None

But the xpath variables work in the following script: https://github.com/stesiakethan/indeed_scraper/blob/main/scraper_main.py

haasr commented 3 weeks ago

Hello, when I conducted the testing before the version change, I tested with the --headless argument removed. It seems that on the newer version, Selenium is only working as expected when the --headless flag is not used. Your code also failed once I added --headless to it:

selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"[id="text-input-what"]"}

My two options would be to roll back to the Python 3.11 version or leave the Python 3.12 version without headless support. I'm going to just keep the 3.12 version for now. My reasoning is that the move from Python 3.11 to 3.12 also caused an unexpected change when using pandas ExcelWriter. It was no longer creating the workbook file correctly because of an exception from openpyxl. I decided to switch the engine to xlsxwriter and it works as expected. I'm afraid if I revert back to the Python 3.11 version, then when I inevitably go to upgrade the Python version in the future, I will have to deal with different behavior of certain packages anyway, so I'm inclined to forego the ability to use it in headless mode to keep the core features intact and keep it current.

haasr commented 3 weeks ago

Removed headless option (commit 1e4dda1). I would like to add it in a future version once I understand it better.