openpreserve / pagelyzer

Suite of tools for detecting changes in web pages and their rendering
http://openplanets.github.io/pagelyzer
Apache License 2.0
53 stars 21 forks source link

pagelyzer_capture : `require': no such file to load -- selenium-webdriver (LoadError) #8

Closed crawler-IM closed 10 years ago

crawler-IM commented 10 years ago

Hi, We are installing the pagelyzer on our server Debian Squeeze : $ uname -a Linux machine-name 2.6.32-5-amd64 #1 SMP Thu Nov 3 03:41:26 UTC 2011 x86_64 GNU/Linux

$java -version java version "1.7.0_40" Java(TM) SE Runtime Environment (build 1.7.0_40-b43) Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode)

$ ruby -v ruby 1.9.2p320 (2012-04-20 revision 35421) [x86_64-linux]

$ gem -v 1.3.7.1

export JAVA_HOME=/usr/lib/jvm/java-7-oracle export PATH=$PATH:/usr/lib/jvm/java-7-oracle/bin

$ bundle Fetching gem metadata from http://rubygems.org/......... Fetching gem metadata from http://rubygems.org/.. Resolving dependencies... Using ffi (1.9.0) Using childprocess (0.3.9) Using headless (1.0.1) Using mini_portile (0.5.1) Using multi_json (1.8.1) Using nokogiri (1.6.0) Installing rjb (1.4.8) Installing rubyzip (0.9.9) Installing sanitize (2.0.6) Installing websocket (1.0.7) Installing selenium-webdriver (2.35.1) Using bundler (1.3.5) Your bundle is complete! Use bundle show [gemname] to see where a bundled gem is installed.

We think that the installation has been done correctly, but when we run the pagelyzer we recieve this error : (I added the line "puts RUBY_VERSION" to be sure that we are using the right version of ruby, that's why it displays "1.9.2")

$ ./pagelyzer capture --url=http://www.google.fr 1.9.2 internal:lib/rubygems/custom_require:29:in require': no such file to load -- selenium-webdriver (LoadError) from <internal:lib/rubygems/custom_require>:29:inrequire' from /1/crawl/hatem-tmp/pagelyzer/pagelyzer-ruby-0.9.1-standalone/lib/pagelyzer_capture.rb:37:in <top (required)>' from /1/crawl/hatem-tmp/pagelyzer/pagelyzer-ruby-0.9.1-standalone/bin/pagelyzer_capture:4:inrequire_relative' from /1/crawl/hatem-tmp/pagelyzer/pagelyzer-ruby-0.9.1-standalone/bin/pagelyzer_capture:4:in `

'

In order to check that, ruby 1.9.2 and selenium webdriver, are working fine together, we did a script that open a webpage using the webdriver into a virtual display, and it works :

puts RUBY_VERSION require "selenium-webdriver" driver = Selenium::WebDriver.for :firefox driver.navigate.to "http://google.com" puts driver.title driver.quit

Xvfb :1 -screen 0 1024x768x24 & DISPLAY=:1 ruby test-sel.rb 1.9.2 Google

We noted that this error has been encountered before https://github.com/openplanets/pagelyzer/issues/1

Can you help us on this problem?

asanoja commented 10 years ago

Hi,

let me check that, and I'll get back to you.

best regards

crawler-IM commented 10 years ago

Hi,

The RVM was installed on the machine in the first test, and we removed it now : $ rvm -bash: rvm: command not found

$ ruby1.9.1 -v ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-linux]

Launching the pagelyzer : $ ruby1.9.1 pagelyzer capture --url=http://www.google.fr 1.9.2 internal:lib/rubygems/custom_require:29:in require': no such file to load -- selenium-webdriver (LoadError) from <internal:lib/rubygems/custom_require>:29:inrequire' from /1/crawl/hatem-tmp/pagelyzer/pagelyzer-ruby-0.9.1-standalone/lib/pagelyzer_capture.rb:37:in <top (required)>' from /1/crawl/hatem-tmp/pagelyzer/pagelyzer-ruby-0.9.1-standalone/bin/pagelyzer_capture:4:inrequire_relative' from /1/crawl/hatem-tmp/pagelyzer/pagelyzer-ruby-0.9.1-standalone/bin/pagelyzer_capture:4:in `

'

the pagelyer display that the ruby used by default is 1.9.2 and the same error is displayed. It is not able to detect the selenium driver.

asanoja commented 10 years ago

After review the code, this issue was due to the fact that several ruby versions can coexists in the same machine, with different library paths. In most of the case ruby1.8 is the default version installed out-of-the-box, however after installing other version like 1.9.1 the default is not changed implicitly.

The architecture of pagelyzer tool is based on concatenation of system calls, more concrete, the pagelyzer file is used as wrapper to other programs in the suite.

For example "./pagelyzer capture " lead to an invocation of a new system call to "./bin/pagelyzer_capture ". That creates a new system process for ruby, to execute the second program. In the code revision, yesterday at IM, revealed that this new invocation must be preceded by the default ruby command (e.g. 'ruby ./bin/pagelyzer_capture'). All files have a shebang defined to "#!/usr/bin/ruby1.9.1", which in theory tells the bash interpreter to use that version. However invoking with the default ruby command lead to confusion to which version it is using. This confusion of versions caused that the interpreter failed to load the required libraries and give the error mentioned in this issue.

The final solution is, then, to avoid creating new processes and instead reuse the current process for subsequent system calls. That allow us to keep the system components decoupled, as it is now, and work with the ruby version used to call pagelyzer program.

More in detail,

Kernel.system() has been replaced with Kernel.load() (http://www.ruby-doc.org/core-2.0.0/Kernel.html) in pagelyzer file (line 88).

There is no need to call other ruby process (system 'ruby ./bin/pagelyzer_capture') instead (load './bin/pagelyzer_capture').

We keep the shebang for each file, since they can be executed directly, without passing by pagelyzer wrapper. They are also useful when creating a debian packaged.

I suggest to close this issue, since the problem and its solution were identified