haasr / indeed-jobs-searcher

Simple CLI-based tool for extracting and storing relevant job info. from bulk job searches on indeed.com
MIT License
7 stars 2 forks source link

RESOLVED: DevToolsActivePort file doesn't exist SessionNotCreatedException Selenuim #2

Closed alimovlex closed 3 weeks ago

alimovlex commented 3 weeks ago

Hello! I have resolved the script crash based on the exception written above.

haasr commented 3 weeks ago

Hello! Just wanted to let you know that I decided to upgrade to Python 3.12.4. I know that is beyond what is in the stable repositories on Linux distributions, but I just wanted to get it caught up with the newest main release because I don't maintain this code very often.

I decided to remove your shebang header because it can be different among different distributions. Since it looks like you were using Linux, I have a blog article about how to easily install a newer Python version In Ubuntu and Linux Mint ( https://ryanhaas.us/post/installing-the-newest-python-version-on-ubuntu-2310-mint-213/ ); it is probably very similar on other distributions. I just haven't tested it on others.

Also, it looks like you are more familiar with selenium than me, because I didn't know about those flags you added, so if you ever feel up to the challenge of making the program scroll pages (this is mentioned in the Issues), we would love to have it so that it scrapes results from multiple pages.

Again, thanks for your contribution!

Sincerely, Ryan

alimovlex commented 3 weeks ago

Oh, whoops! Thanks for catching that.

Anytime.

alimovlex commented 3 weeks ago

Hello! Just wanted to let you know that I decided to upgrade to Python 3.12.4. I know that is beyond what is in the stable repositories on Linux distributions, but I just wanted to get it caught up with the newest main release because I don't maintain this code very often.

I decided to remove your shebang header because it can be different among different distributions. Since it looks like you were using Linux, I have a blog article about how to easily install a newer Python version In Ubuntu and Linux Mint ( https://ryanhaas.us/post/installing-the-newest-python-version-on-ubuntu-2310-mint-213/ ); it is probably very similar on other distributions. I just haven't tested it on others.

Also, it looks like you are more familiar with selenium than me, because I didn't know about those flags you added, so if you ever feel up to the challenge of making the program scroll pages (this is mentioned in the Issues), we would love to have it so that it scrapes results from multiple pages.

Again, thanks for your contribution!

Sincerely, Ryan

Hello!

I usually download Anaconda and use python from there to avoid breaking system wide python. No problem with shebang. I totally get it. https://www.anaconda.com/download

Sincerely, Alex

haasr commented 3 weeks ago

Oh... That's fair.

alimovlex commented 3 weeks ago

Also, it looks like you are more familiar with selenium than me, because I didn't know about those flags you added, so if you ever feel up to the challenge of making the program scroll pages (this is mentioned in the Issues), we would love to have it so that it scrapes results from multiple pages.

Actually, I have just begun my path in web-scraping using the python skills that I gained from Radboud University because I struggle in a job search. By the way, as I am not a python coder, I used chat GPT to correct mistakes in my code. In addition to that, the solution with flags, that I've found was from stackoverflow.

Anyway. I am just exercising in programming. I'm glad that my commit was useful.

Best, Alex.

haasr commented 3 weeks ago

Oh, haha, bet! I'm guilty of using dozens of libraries before properly "learning" them and Selenium and BS4 fall into that category. I started using ChatGPT to write the beautifulsoup4 parsing because its so much quicker than manually hunting through the XML to find where different info is stored. When Indeed changes something that breaks the code, I just print the XML I'm parsing and have Chat GPT tell me how to parse out the business or whatever. Super useful.

alimovlex commented 3 weeks ago

Oh, haha, bet! I'm guilty of using dozens of libraries before properly "learning" them and Selenium and BS4 fall into that category. I started using ChatGPT to write the beautifulsoup4 parsing because its so much quicker than manually hunting through the XML to find where different info is stored. When Indeed changes something that breaks the code, I just print the XML I'm parsing and have Chat GPT tell me how to parse out the business or whatever. Super useful.

Moreover, I have just tested the following script. It actually enters the search query and doesn't crash. I tried your searcher, and unfortunately the trouble was in XPath variable of a lib/searcher.py. To be more precise, the script crashes here, in my humble opinion:

def get_searched_page(job_query, location, url='https://indeed.com/'):
    driver.get(url)
    sleep(4)
    query_field = driver.find_element('xpath', '//*[@id="text-input-what"]')
    query_field.send_keys(Keys.CONTROL + "a")
    query_field.send_keys(Keys.DELETE)
    sleep(.6)
    query_field.send_keys(job_query)
    sleep(1)

    loc_field = driver.find_element('xpath', '//*[@id="text-input-where"]')
    loc_field.send_keys(Keys.CONTROL + "a")
    loc_field.send_keys(Keys.DELETE)
    sleep(.6)
    loc_field.send_keys(location)
    sleep(1)
    try:
        loc_field.send_keys(Keys.ENTER)
    except:
        driver.find_element('xpath', '//button[@type="submit"]').click()
    sleep(3)

    return driver.current_url, driver.page_source, get_search_timestamp()

The link to the working indeed script: https://github.com/candibod/job_scraper/blob/main/fetch_jobs.py