Open Mirk32 opened 4 years ago
I have sort of the same issue when I use :selenium_chrome
, but on my machine
/Users/kaka/.asdf/installs/ruby/2.5.1/lib/ruby/gems/2.5.0/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/common/platform.rb:136:in `assert_file': not a file: "/usr/local/bin/chromedriver" (Selenium::WebDriver::Error::WebDriverError)
It works when I use :selenium_firefox
Also check the config, haven't tried it but maybe changing the default location for the webdriver could help https://github.com/vifreefly/kimuraframework#configuration-options
# Provide custom chrome binary path (default is any available chrome/chromium in the PATH):
# config.selenium_chrome_path = "/usr/bin/chromium-browser"
# Provide custom selenium chromedriver path (default is "/usr/local/bin/chromedriver"):
# config.chromedriver_path = "~/.local/bin/chromedriver"
Thanks @kaka-ruto i tried usng Kimurai.configure and worked as shown below
require 'kimurai'
Kimurai.configure do |config|
# Default logger has colored mode in development.
# If you would like to disable it, set `colorize_logger` to false.
# config.colorize_logger = false
# Logger level for default logger:
# config.log_level = :info
# Custom logger:
# config.logger = Logger.new(STDOUT)
# Custom time zone (for logs):
# config.time_zone = "UTC"
# config.time_zone = "Europe/Moscow"
# Provide custom chrome binary path (default is any available chrome/chromium in the PATH):
# config.selenium_chrome_path = "/usr/bin/chromium-browser"
# Provide custom selenium chromedriver path (default is "/usr/local/bin/chromedriver"):
config.chromedriver_path = "/usr/bin/chromedriver"
end
class JobScraper < Kimurai::Base
@name= 'eng_job_scraper'
@start_urls = ["https://www.indeed.com/jobs?q=software+engineer&l=New+York%2C+NY"]
@engine = :selenium_chrome
@@jobs = []
def scrape_page
doc = browser.current_response
returned_jobs = doc.css('td#resultsCol')
returned_jobs.css('div.jobsearch-SerpJobCard').each do |char_element|
title = char_element.css('h2 a')[0].attributes["title"].value.gsub(/\n/, "")
link = "https://indeed.com" + char_element.css('h2 a')[0].attributes["href"].value.gsub(/\n/, "")
description = char_element.css('div.summary').text.gsub(/\n/, "")
company = description = char_element.css('span.company').text.gsub(/\n/, "")
location = char_element.css('div.location').text.gsub(/\n/, "")
salary = char_element.css('div.salarySnippet').text.gsub(/\n/, "")
requirements = char_element.css('div.jobCardReqContainer').text.gsub(/\n/, "")
# job = [title, link, description, company, location, salary, requirements]
job = {title: title, link: link, description: description, company: company, location: location, salary: salary, requirements: requirements}
@@jobs << job if !@@jobs.include?(job)
end
end
def parse(response, url:, data: {})
10.times do
scrape_page
if browser.current_response.css('div#popover-background') || browser.current_response.css('div#popover-input-locationtst')
browser.refresh
end
browser.find('/html/body/table[2]/tbody/tr/td/table/tbody/tr/td[1]/nav/div/ul/li[6]/a/span').click
puts "๐น ๐น ๐น CURRENT NUMBER OF JOBS: #{@@jobs.count}๐น ๐น ๐น"
puts "๐บ ๐บ ๐บ ๐บ ๐บ CLICKED NEXT BUTTON ๐บ ๐บ ๐บ ๐บ "
end
CSV.open('jobs.csv', "w") do |csv|
csv << @@jobs
end
File.open("jobs.json","w") do |f|
f.write(JSON.pretty_generate(@@jobs))
end
@@jobs
end
end
jobs = JobScraper.crawl!
FYI, I am using Archilinux and by default chromedriver is installed in this path '/usr/bin/chromedriver', finally when i ran the code i found another issue related to lsof it tool by default is not installed in Arch so i had to install it from AUR reposittories
yay -S lsof
Now everything looks good :)
Awesome @GarnicaJR ! Glad you got it working.
I try to run crawler via Sidekiq job on my DigitalOcean droplet, but always get fail with error
Selenium::WebDriver::Error::WebDriverError: not a file: "./bin/chromedriver"
, in the same time I can run crawl! via rails console and it works well, also it works well via Sidekiq on my local machine. I defined chromedriver_path in the Kimurai initializer -config.chromedriver_path = Rails.root.join('lib', 'webdrivers', 'chromedriver_83').to_s
Logs of the Sidekiq job which I started also via rails console withFekoCrawlWorker.perform_async
Sidekiq worker code: