momer / nutch-selenium

Apache License 2.0
28 stars 20 forks source link

Cannot Compile in Nutch1.6 and Nutch1.8 #1

Closed datafireball closed 10 years ago

datafireball commented 10 years ago

I was following your README file step by step and failed ant runtime in both environment. Can you share with us your compile environment or maybe package your nutch folder into a tar ball. Thanks!

My environment:
AWS-ubuntu: Linux ip- xx.xx.xx.xx 3.13.0-29-generic #53-Ubuntu SMP Wed Jun 4 21:00:20 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

java version: java version "1.7.0_55" OpenJDK Runtime Environment (IcedTea 2.4.7) (7u55-2.4.7-1ubuntu1) OpenJDK 64-Bit Server VM (build 24.51-b03, mixed mode)

ant version: Apache Ant(TM) version 1.9.3 compiled on April 8 2014

Error Message: 1.6: compile: [echo] Compiling plugin: lib-selenium [javac] /home/ubuntu/apache-nutch-1.6/src/plugin/build-plugin.xml:117: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds [javac] Compiling 1 source file to /home/ubuntu/apache-nutch-1.6/build/lib-selenium/classes [javac] warning: [options] bootstrap class path not set in conjunction with -source 1.6 [javac] /home/ubuntu/apache-nutch-1.6/src/plugin/lib-selenium/src/java/org/apache/nutch/protocol/selenium/HttpWebClient.java:31: error: cannot access RemoteWebDriver [javac] WebDriver driver = new FirefoxDriver(profile); [javac] ^ [javac] class file for org.openqa.selenium.remote.RemoteWebDriver not found [javac] /home/ubuntu/apache-nutch-1.6/src/plugin/lib-selenium/src/java/org/apache/nutch/protocol/selenium/HttpWebClient.java:40: error: incompatible types [javac] driver = new FirefoxDriver(); [javac] ^ [javac] required: WebDriver [javac] found: FirefoxDriver [javac] 2 errors [javac] 1 warning

BUILD FAILED /home/ubuntu/apache-nutch-1.6/build.xml:103: The following error occurred while executing this line: /home/ubuntu/apache-nutch-1.6/src/plugin/build.xml:71: The following error occurred while executing this line: /home/ubuntu/apache-nutch-1.6/src/plugin/build-plugin.xml:117: Compile failed; see the compiler error output for details.

Total time: 45 seconds

1.8 compile: [echo] Compiling plugin: protocol-selenium [javac] Compiling 2 source files to /home/ubuntu/apache-nutch-1.8/build/protocol-selenium/classes [javac] warning: [options] bootstrap class path not set in conjunction with -source 1.6 [javac] /home/ubuntu/apache-nutch-1.8/src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium/Http.java:14: error: package org. apache.nutch.storage does not exist [javac] import org.apache.nutch.storage.WebPage; [javac] ^ [javac] /home/ubuntu/apache-nutch-1.8/src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium/Http.java:15: error: package org. apache.nutch.storage.WebPage does not exist [javac] import org.apache.nutch.storage.WebPage.Field; [javac] ^ [javac] /home/ubuntu/apache-nutch-1.8/src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium/Http.java:26: error: package WebP age does not exist [javac] private static final Collection FIELDS = new HashSet(); [javac] ^ [javac] /home/ubuntu/apache-nutch-1.8/src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium/Http.java:49: error: cannot find symbol [javac] protected Response getResponse(URL url, WebPage page, boolean redirect) [javac] ^ [javac] symbol: class WebPage [javac] location: class Http [javac] /home/ubuntu/apache-nutch-1.8/src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium/Http.java:55: error: package WebP age does not exist [javac] public Collection getFields() { [javac] ^ [javac] /home/ubuntu/apache-nutch-1.8/src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium/HttpResponse.java:16: error: pack age org.apache.nutch.storage does not exist [javac] import org.apache.nutch.storage.WebPage; [javac] ^ [javac] /home/ubuntu/apache-nutch-1.8/src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium/HttpResponse.java:47: error: cann ot find symbol [javac] public HttpResponse(Http http, URL url, WebPage page, Configuration conf) throws ProtocolException, IOException { [javac] ^ [javac] symbol: class WebPage [javac] location: class HttpResponse [javac] /home/ubuntu/apache-nutch-1.8/src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium/Http.java:26: error: package WebP age does not exist [javac] private static final Collection FIELDS = new HashSet(); [javac] ^ [javac] /home/ubuntu/apache-nutch-1.8/src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium/Http.java:29: error: package WebP age does not exist [javac] FIELDS.add(WebPage.Field.MODIFIED_TIME); [javac] ^ [javac] /home/ubuntu/apache-nutch-1.8/src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium/Http.java:30: error: package WebP age does not exist [javac] FIELDS.add(WebPage.Field.HEADERS); [javac] ^ [javac] /home/ubuntu/apache-nutch-1.8/src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium/Http.java:54: error: method does not override or implement a method from a supertype [javac] @Override [javac] ^ [javac] 11 errors [javac] 1 warning

BUILD FAILED /home/ubuntu/apache-nutch-1.8/build.xml:107: The following error occurred while executing this line: /home/ubuntu/apache-nutch-1.8/src/plugin/build.xml:77: The following error occurred while executing this line: /home/ubuntu/apache-nutch-1.8/src/plugin/build-plugin.xml:117: Compile failed; see the compiler error output for details. Total time: 14 seconds

momer commented 10 years ago

Ah hey a few things have changed and the readme hasn't followed suit - in fact I think there's some things hard coded in that need to be extracted.

I'll take a look tomorrow morning for you,

Mo

momer commented 10 years ago

Update - this is for nutch 2.x - specifically for 2.2.1 as mentioned in the readme.

I did have some issues with Zombie Firefox processes and have developed a plugin which is based on Selenium-Node and Selenium-Hub (aka Selenium Grid) and will work to clean that up and release it soon.

datafireball commented 10 years ago

Hi momer, I feel your selenium-plugin is very intersting. Since I have never written a Nutch plugin before. I am catching up with the Nutch plugin homework. :) I have one question about your code. I see that you are using Firefox driver which has the capability to populate the javascript-based content. However, looks like every fetch is done by restarting a new firefox driver and close it in the end. So I think some kind of trade-off might worth thinking? If we initiate a new browser every URL, the fetching speed will be dramatically slow, if we keep using one browser, they might store the cookie and keep the same session, it might get blocked. What is your input here?

momer commented 10 years ago

That's exactly why I out together the plugin that uses selenium grid that I mentioned above - I'll see if I can get to cleaning that up for you to check out tomorrow

datafireball commented 10 years ago

Hi momer, I was looking at Selenium Grid this weekend and I agree, the performance will be dramatically improved if the selenium hub is properly configured. And in that way, the you also don't need to worry about the bolt and nut of running browsers in parallel. I am new to Github and wondering if this is the right place to work together. I am very curious and interested in this project but I am new to Java and source control generally. I am wondering is there a way that I can learn from you and see if there is anything I can help.

momer commented 10 years ago

@biwa7636 This is something I've been meaning to do for a while, so I'm sure my employers will be OK with me taking a look at this. I'm extracting the plugins now for you; If I don't get this done today, I'll have it done tomorrow - stay tuned!

momer commented 10 years ago

@biwa7636 The plugin is available here: https://github.com/momer/nutch-selenium-grid-plugin