openpreserve / pagelyzer

Suite of tools for detecting changes in web pages and their rendering
http://openplanets.github.io/pagelyzer
Apache License 2.0
53 stars 21 forks source link

Java version : generate the jar file #10

Open crawler-IM opened 10 years ago

crawler-IM commented 10 years ago

Hi,

I'm trying to use the java version of the Pagelyzer tool and some informations are missed :

Thanks.

ZeynepP commented 10 years ago

Try to take a look at the Maven version https://github.com/openplanets/pagelyzer/tree/master/Maven

asanoja commented 10 years ago

We have also a standalone version of the jar with all dependencies included used only for testing and development. That is a huge 84M jar file. If you are interested.

crawler-IM commented 10 years ago

Hi both,

It would be very helpful to provide the jar file of pagelyzer, Can you please upload it and give the direct URL to download it?

Thanks.

ZeynepP commented 10 years ago

You can get it from here: https://github.com/openplanets/scape-demo-sites/blob/bootstrap/pagelyzer/jPagelyzer.jar

keheliya commented 10 years ago

Hi, I'm trying to use jPagelyzer and came across a problem. These are the steps I followed:

java -jar jPagelyzer.jar -get score  -url1 http://www.lip6.fr -url2 http://www.lip6.fr
Using parameters found in /Users/keheliya/dev/jpagelyzer-built/ext/ex_images.xml
Change detection. Mode: images. Selenium: remote http://127.0.0.1:8015/wd/hub
Setting up browser: firefox
Attempt = 1/10
Setting up browser: firefox
Attempt = 1/10
java.lang.ClassNotFoundException: Scape/FileConfig
Continuing ...
java.lang.NoSuchMethodException: <unbound>=XMLDecoder.new();
Continuing ...
java.lang.IllegalStateException: The outer element does not return value
Continuing ...
java.lang.IllegalStateException: The outer element does not return value
Continuing ...
java.lang.IllegalStateException: The outer element does not return value
Continuing ...
java.lang.IllegalStateException: The outer element does not return value
Continuing ...
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
    at java.beans.XMLDecoder.readObject(XMLDecoder.java:250)
    at Taverna.FileConfig.deserializeXMLToObject(FileConfig.java:42)
    at Taverna.ScapeTest.init(ScapeTest.java:44)
    at fr.lip6.jpagelyzer.JPagelyzer.changeDetection(JPagelyzer.java:96)
    at fr.lip6.jpagelyzer.JPagelyzer.main(JPagelyzer.java:292)

I can see 2 empty Firefox windows have also been started by selenium when I try this. Am I missing any other configuration?

Thanks

asanoja commented 10 years ago

Thanks for the feedback. We'll see what happened an as soon as possible, we'll get back to you

ZeynepP commented 10 years ago

jPagelyzer.jar is not the version generated with the new code. Please use this jar http://scape.lip6.fr/Pagelyzer-0.0.1-SNAPSHOT-jar-with-dependencies.jar

asanoja commented 10 years ago

In the new jar, the parameter -config is mandatory.

try with this, and give us the feedback

java -jar Pagelyzer-0.0.1-SNAPSHOT-jar-with-dependencies.jar -get score -url1 http://www.lip6.fr -url2 http://www.lip6.fr -config /path/to/config/file.xml

Looking the error the bufferedimages are null, that means that the Capture's objects are not properly initialized.

keheliya commented 10 years ago

Hi, Thanks for looking into this. I got a little bit further with the new jar. I can see the 2 segemented web pages in the browser windows. But process ends abruptly with a different error. See the trace below:

java -jar Pagelyzer-0.0.1-SNAPSHOT-jar-with-dependencies.jar -get score -url1 http://www.lip6.fr -url2 http://www.lip6.fr -config config.xml
Selenium: local WebDriver
Using parameters found in /Users/keheliya/dev/jpagelyzer-built/ext//ex_hybrid.xml
Change detection. Mode: hybrid. Port:8016
Setting up browser: firefox
Attempt = 1
Setting up browser: firefox
Attempt = 1
getting data using driver: firefox
title: Accueil LIP6
Starting server on port 8016
Using BoM algorithm v1.1 pAC=0.6
Shutting down server on port 8016
getting data using driver: firefox
title: Accueil LIP6
Starting server on port 8016
Using BoM algorithm v1.1 pAC=0.6
Shutting down server on port 8016
Exception in thread "main" java.util.ServiceConfigurationError: javax.imageio.spi.ImageReaderSpi: Provider com.sun.media.imageioimpl.plugins.jpeg2000.J2KImageReaderSpi could not be instantiated: java.lang.IllegalArgumentException: vendorName == null!
    at java.util.ServiceLoader.fail(ServiceLoader.java:224)
    at java.util.ServiceLoader.access$100(ServiceLoader.java:181)
    at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:377)
    at java.util.ServiceLoader$1.next(ServiceLoader.java:445)
    at javax.imageio.spi.IIORegistry.registerApplicationClasspathSpis(IIORegistry.java:210)
    at javax.imageio.spi.IIORegistry.<init>(IIORegistry.java:138)
    at javax.imageio.spi.IIORegistry.getDefaultInstance(IIORegistry.java:159)
    at javax.imageio.ImageIO.<clinit>(ImageIO.java:65)
    at pagelyzer.CaptureResult.getBufferedImage(CaptureResult.java:94)
    at pagelyzer.JPagelyzer.changeDetection(JPagelyzer.java:225)
    at pagelyzer.JPagelyzer.main(JPagelyzer.java:318)
Caused by: java.lang.IllegalArgumentException: vendorName == null!
    at javax.imageio.spi.IIOServiceProvider.<init>(IIOServiceProvider.java:76)
    at javax.imageio.spi.ImageReaderWriterSpi.<init>(ImageReaderWriterSpi.java:231)
    at javax.imageio.spi.ImageReaderSpi.<init>(ImageReaderSpi.java:212)
    at com.sun.media.imageioimpl.plugins.jpeg2000.J2KImageReaderSpi.<init>(J2KImageReaderSpi.java:70)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at java.lang.Class.newInstance(Class.java:374)
    at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373)
    ... 8 more
keheliya commented 10 years ago

One more thing I noticed is, it expects the 'js' directory to be in the parent of 'ext' although there's no mention about a 'js' directory in the config file. I got around it by copying the 'js' directory also to the path I had 'ext' directory.

ZeynepP commented 10 years ago

Sorry about that we did not finish yet writing Readme for the new version. Here is the explications that we use in our internal meetings. All will be in ReadMe Files soon.

Installation

You can put wherever you want "SettingsFiles", you can change also the folder name but please make attention to keep /ext folder and /js folder in the same directory.

You should add this "SettingsFiles" to config.xml

hybrid # here "SettingsFiles" full path #

You can create different config files based on your needs. Here is the jar file generated from maven code.

http://scape.lip6.fr/Pagelyzer-0.0.1-SNAPSHOT-jar-with-dependencies.jar

java -jar Pagelyzer-0.0.1-SNAPSHOT-jar-with-dependencies.jar -get score -url1 "http://www.lip6.fr" -url2 "http://www.lip6.fr" -config "/home/Bureau/config.xml"

keheliya commented 10 years ago

Yep. I figured that out. But it looks like the error I mentioned above is due to some thing else...

Exception in thread "main" java.util.ServiceConfigurationError: 
javax.imageio.spi.ImageReaderSpi: 
Provider com.sun.media.imageioimpl.plugins.jpeg2000.J2KImageReaderSpi could not be instantiated: 
java.lang.IllegalArgumentException: vendorName == null!

Can you tell if I'm missing anything in the configuration?

asanoja commented 10 years ago

The error shows that there are some problems with the third-party library javax.media and com.sun media. It is neither the logic nor implementacion of PL. Tomorrow i will recheck library versions and generare a new jar.

Best to all

On Wednesday, March 26, 2014, Keheliya Gallaba notifications@github.com wrote:

Yep. I figured that out. But it looks like the error I mentioned above is due to some thing else...

Exception in thread "main" java.util.ServiceConfigurationError: javax.imageio.spi.ImageReaderSpi: Provider com.sun.media.imageioimpl.plugins.jpeg2000.J2KImageReaderSpi could not be instantiated: java.lang.IllegalArgumentException: vendorName == null!

Can you tell if I'm missing anything in the configuration?

Reply to this email directly or view it on GitHubhttps://github.com/openplanets/pagelyzer/issues/10#issuecomment-38730162 .

Andres Sanoja

If you need to print this email or any attachments, reuse and recycle the paper

anjackson commented 10 years ago

I suspect that error is cropping up because you are relying on the old Java Advanced Imaging API, which was a Sun add-on to the core JVM that is no longer supported.

keheliya commented 10 years ago

Thank you very much for the hint. I noticed it's being used here. Looks like jai-imageio-core-standalone jar at mygrid repo was also not providing it.

JDescriptors was only compiled after adding the following repository

<repository>
 <id>thirdparty-releases</id> 
 <name>JBoss Thirdparty Releases</name> 
 <url>https://repository.jboss.org/nexus/content/repositories/thirdparty-releases</url> 
</repository>

What's the correct dependency to use?

EDIT: Sorry if this comment was confusing. It's only related to #11 I think. I was explaining how I got the dependencies compiled.

ZeynepP commented 10 years ago

"What's the correct dependency to use? " I am asking the same question. I tried all the solution here: http://stackoverflow.com/questions/1209583/using-java-advanced-imaging-with-maven. I used the one that worked on several different environments at our lab. Apparently, it is still not a generic solution :( I would really appreciate any help.

ZeynepP commented 10 years ago

This works for me

<dependency>
        <groupId>javax.media</groupId>
        <artifactId>jai-core</artifactId>
         <version>1.1.3</version>
  </dependency>
  <dependency>
               <groupId>com.sun.media</groupId>
               <artifactId>jai-codec</artifactId>
               <version>1.1.3</version>
 </dependency>