Open Tails opened 5 years ago
@Tails, would you be interested to make a PR for this?
I will somewhere this week.
How do you use this?
IMHO docker image would be enough
Works for me (developing compilation): Dockerfile
FROM ruby:2.5.3-stretch
RUN gem install kimurai
RUN apt-get update && apt-get install -q -y git unzip lsof wget tar openssl xvfb chromium \
firefox-esr libsqlite3-dev sqlite3 mysql-client default-libmysqlclient-dev
RUN cd /tmp && \
wget https://chromedriver.storage.googleapis.com/2.39/chromedriver_linux64.zip && \
unzip chromedriver_linux64.zip -d /usr/local/bin && \
rm -f chromedriver_linux64.zip
RUN cd /tmp && \
wget https://github.com/mozilla/geckodriver/releases/download/v0.21.0/geckodriver-v0.21.0-linux64.tar.gz && \
tar -xvzf geckodriver-v0.21.0-linux64.tar.gz -C /usr/local/bin && \
rm -f geckodriver-v0.21.0-linux64.tar.gz
RUN apt install -q -y chrpath libxft-dev libfreetype6 libfreetype6-dev libfontconfig1 libfontconfig1-dev && \
cd /tmp && \
wget https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2 && \
tar -xvjf phantomjs-2.1.1-linux-x86_64.tar.bz2 && \
mv phantomjs-2.1.1-linux-x86_64 /usr/local/lib && \
ln -s /usr/local/lib/phantomjs-2.1.1-linux-x86_64/bin/phantomjs /usr/local/bin && \
rm -f phantomjs-2.1.1-linux-x86_64.tar.bz2
RUN mkdir -p /app
ADD Gemfile /app
RUN cd /app && bundle install
Gemfile
source 'https://rubygems.org' do
gem 'kimurai'
gem 'byebug'
end
Build
docker build . -t simple-kimurai
Run (it opens container with installed env. for developing with mounetd current_dir)
docker run --rm -it -v ${PWD}:/app -w /app simple-kimurai bash
It would be great if owner creates oficial docker image.
@seliverstov-maxim Dockerfile is great, but it crashes when running with multithreads
I, [2021-05-07 08:17:08 +0000#1693] [C: 47304296299360] INFO -- MySpider: Info: visits: requests: 7, responses: 6
D, [2021-05-07 08:17:08 +0000#1693] [C: 47304296299360] DEBUG -- MySpider: Browser: driver.current_memory: 3837
I, [2021-05-07 08:17:08 +0000#1693] [C: 47304296299360] INFO -- MySpider: Browser: driver selenium_chrome has been destroyed
#<Thread:0x0000560bc78df6c0@/usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/base.rb:299 run> terminated with exception (report_on_exception is true):
Traceback (most recent call last):
19: from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/base.rb:305:in `block (2 levels) in in_parallel'
18: from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/base.rb:305:in `each'
17: from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/base.rb:313:in `block (3 levels) in in_parallel'
16: from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/base.rb:204:in `request_to'
15: from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/base.rb:204:in `public_send'
14: from a.rb:33:in `try_parse'
13: from a.rb:52:in `parse_question_page'
12: from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/capybara_ext/session.rb:21:in `visit'
11: from /usr/local/bundle/gems/capybara-3.35.3/lib/capybara/session.rb:278:in `visit'
10: from /usr/local/bundle/gems/capybara-3.35.3/lib/capybara/selenium/driver.rb:104:in `visit'
9: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/common/navigation.rb:32:in `to'
8: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/oss/bridge.rb:52:in `get'
7: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/oss/bridge.rb:587:in `execute'
6: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/bridge.rb:167:in `execute'
5: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/http/common.rb:64:in `call'
4: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/http/default.rb:114:in `request'
3: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/http/common.rb:88:in `create_response'
2: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/http/common.rb:88:in `new'
1: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/response.rb:34:in `initialize'
/usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/response.rb:72:in `assert_ok': unknown error: session deleted because of page crash (Selenium::WebDriver::Error::UnknownError)
from unknown error: cannot determine loading status
from tab crashed
(Session info: headless chrome=73.0.3683.75)
(Driver info: chromedriver=2.39.562737 (dba483cee6a5f15e2e2d73df16968ab10b38a2bf),platform=Linux 5.10.25-linuxkit x86_64)
I, [2021-05-07 08:17:08 +0000#1693] [M: 47304283293120] INFO -- MySpider: Browser: driver selenium_chrome has been destroyed
F, [2021-05-07 08:17:08 +0000#1693] [M: 47304283293120] FATAL -- MySpider: Spider: stopped: {:spider_name=>"MySpider", :status=>:failed, :error=>"#<Selenium::WebDriver::Error::UnknownError: unknown error: session deleted because of page crash\nfrom unknown error: cannot determine loading status\nfrom tab crashed\n (Session info: headless chrome=73.0.3683.75)\n (Driver info: chromedriver=2.39.562737 (dba483cee6a5f15e2e2d73df16968ab10b38a2bf),platform=Linux 5.10.25-linuxkit x86_64)>", :environment=>"development", :start_time=>2021-05-07 08:16:42 +0000, :stop_time=>2021-05-07 08:17:08 +0000, :running_time=>"25s", :visits=>{:requests=>7, :responses=>6}, :items=>{:sent=>0, :processed=>0}, :events=>{:requests_errors=>{}, :drop_items_errors=>{}, :custom=>{}}}
I, [2021-05-07 08:17:08 +0000#1693] [C: 47304296275900] INFO -- MySpider: Browser: driver selenium_chrome has been destroyed
I, [2021-05-07 08:17:08 +0000#1693] [C: 47304296321600] INFO -- MySpider: Browser: driver selenium_chrome has been destroyed
I, [2021-05-07 08:17:08 +0000#1693] [C: 47304296845720] INFO -- MySpider: Browser: driver selenium_chrome has been destroyed
Traceback (most recent call last):
19: from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/base.rb:305:in `block (2 levels) in in_parallel'
18: from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/base.rb:305:in `each'
17: from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/base.rb:313:in `block (3 levels) in in_parallel'
16: from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/base.rb:204:in `request_to'
15: from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/base.rb:204:in `public_send'
14: from a.rb:33:in `try_parse'
13: from a.rb:52:in `parse_question_page'
12: from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/capybara_ext/session.rb:21:in `visit'
11: from /usr/local/bundle/gems/capybara-3.35.3/lib/capybara/session.rb:278:in `visit'
10: from /usr/local/bundle/gems/capybara-3.35.3/lib/capybara/selenium/driver.rb:104:in `visit'
9: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/common/navigation.rb:32:in `to'
8: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/oss/bridge.rb:52:in `get'
7: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/oss/bridge.rb:587:in `execute'
6: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/bridge.rb:167:in `execute'
5: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/http/common.rb:64:in `call'
4: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/http/default.rb:114:in `request'
3: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/http/common.rb:88:in `create_response'
2: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/http/common.rb:88:in `new'
1: from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/response.rb:34:in `initialize'
/usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/response.rb:72:in `assert_ok': unknown error: session deleted because of page crash (Selenium::WebDriver::Error::UnknownError)
from unknown error: cannot determine loading status
from tab crashed
(Session info: headless chrome=73.0.3683.75)
(Driver info: chromedriver=2.39.562737 (dba483cee6a5f15e2e2d73df16968ab10b38a2bf),platform=Linux 5.10.25-linuxkit x86_64)
I'm having the same issues with multithreading inside of a docker container. Code works great on my Mac OS X box.
::WebDriver::Error::UnknownError: unknown error: session deleted because of page crash\nfrom unknown error: cannot determine loading status\nfrom tab crashed\n (Session info: headless chrome=86.0.4240.111)>", :environment=>"development", :start_time=>2021-07-25 18:06:00.6242447 +0000, :stop_time=>2021-07-25 18:06:18.1101284 +0000, :running_time=>"17s", :visits=>{:requests=>2, :responses=>1}, :items=>{:sent=>0, :processed=>0}, :events=>{:requests_errors=>{}, :drop_items_errors=>{}, :custom=>{}}}
/usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/response.rb:72:in `assert_ok': unknown error: session deleted because of page crash (Selenium::WebDriver::Error::UnknownError)
from unknown error: cannot determine loading status
from tab crashed
(Session info: headless chrome=86.0.4240.111)
from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/response.rb:34:in `initialize'
from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/http/common.rb:88:in `new'
from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/http/common.rb:88:in `create_response'
from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/http/default.rb:114:in `request'
from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/http/common.rb:64:in `call'
from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/bridge.rb:167:in `execute'
from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/w3c/bridge.rb:567:in `execute'
from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/w3c/bridge.rb:59:in `get'
from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/common/navigation.rb:32:in `to'
from /usr/local/bundle/gems/capybara-3.35.3/lib/capybara/selenium/driver.rb:104:in `visit'
from /usr/local/bundle/gems/capybara-3.35.3/lib/capybara/session.rb:278:in `visit'
from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/capybara_ext/session.rb:21:in `visit'
from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/base.rb:201:in `request_to'
from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/base.rb:313:in `block (3 levels) in in_parallel'
from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/base.rb:305:in `each'
from /usr/local/bundle/gems/kimurai-1.4.0/lib/kimurai/base.rb:305:in `block (2 levels) in in_parallel'
@thanhtoan1196 did you figure out a workaround?
@hjhart @thanhtoan1196 In my case I can't modify certain configurations of my docker container so I added the following flag: --disable-dev-shm-usage
and everything worked like a charm. The downside is that now is using /tmp
folder and probably your spider will be slower.
Problem is described here: https://stackoverflow.com/questions/53902507/unknown-error-session-deleted-because-of-page-crash-from-unknown-error-cannot
I have put together an updated version for the docker configuration.
https://github.com/iwoogy/kimurai-docker-example
Hope it could help.
Its easy to get up and running using Docker (no need to install a bunch of dependencies on a system that you don't know about).
I got Docker working using the following files:
And its docker-compose.yml: