When Indexing WARC files I encountered this error:
Error indexing test_warcs/warcfilename-00000.warc.gz (return code != 0)
The log gave the following output:
Parsing Archive File [1/1]:warcfile.warc.gz
WARN HashedInputStream - Hashes are not equal for 'https://www.instagram.com/robots.txt'. WARC-header: sha1:ETOSJAUJR7RNMPCNQWBYO3CNCLGBMOOJ, content: sha1:ADLJUKBPVD5C4LURSVVMU2HC4FFRDTK6
Exception in thread "timelimiter_1662110408211" java.lang.UnsatisfiedLinkError: Can't load library: /usr/lib/jvm/java-11-openjdk-amd64/lib/libawt_xawt.so
at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2633)
at java.base/java.lang.Runtime.load0(Runtime.java:768)
at java.base/java.lang.System.load(System.java:1837)
at java.base/java.lang.ClassLoader$NativeLibrary.load0(Native Method)
at java.base/java.lang.ClassLoader$NativeLibrary.load(ClassLoader.java:2445)
at java.base/java.lang.ClassLoader$NativeLibrary.loadLibrary(ClassLoader.java:2501)
at java.base/java.lang.ClassLoader.loadLibrary0(ClassLoader.java:2700)
at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2651)
at java.base/java.lang.Runtime.loadLibrary0(Runtime.java:830)
at java.base/java.lang.System.loadLibrary(System.java:1873)
at java.desktop/java.awt.Toolkit$3.run(Toolkit.java:1399)
at java.desktop/java.awt.Toolkit$3.run(Toolkit.java:1397)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.desktop/java.awt.Toolkit.loadLibraries(Toolkit.java:1396)
at java.desktop/java.awt.Toolkit.<clinit>(Toolkit.java:1429)
at java.desktop/sun.awt.AppContext$2.run(AppContext.java:282)
at java.desktop/sun.awt.AppContext$2.run(AppContext.java:271)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.desktop/sun.awt.AppContext.initMainAppContext(AppContext.java:271)
at java.desktop/sun.awt.AppContext$3.run(AppContext.java:326)
at java.desktop/sun.awt.AppContext$3.run(AppContext.java:309)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.desktop/sun.awt.AppContext.getAppContext(AppContext.java:308)
at java.desktop/javax.imageio.spi.IIORegistry.getDefaultInstance(IIORegistry.java:129)
at java.desktop/javax.imageio.ImageIO.<clinit>(ImageIO.java:66)
at org.apache.tika.parser.image.ImageParser.parse(ImageParser.java:177)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
at uk.bl.wa.analyser.payload.TikaPayloadAnalyser$ParseRunner.run(TikaPayloadAnalyser.java:545)
at java.base/java.lang.Thread.run(Thread.java:829)
I used Oracles Java 11.0.16. After changing to OpenJDK the indexing worked.
The issue appears to be that the headless version of the JDK is not enough, in this case because the Tika ImageIO parser needs it. I updated the Quick Start accordingly.
When Indexing WARC files I encountered this error:
Error indexing test_warcs/warcfilename-00000.warc.gz (return code != 0)
The log gave the following output:
I used Oracles Java 11.0.16. After changing to OpenJDK the indexing worked.