Closed kwalcock closed 5 years ago
Thanks, will check today.
I am testing this now: I created a branch where the constructor is called without arguments. However, I ran into another error - probably unrelated - that I saw for the first time:
...
15:26:54.585 [main] INFO org.clulab.wm.eidos.utils.Sourcer$ - Sourcing resource file:/Users/ben/tmp/eidos/target/scala-2.12/eidos-assembly-0.2.3-SNAPSHOT.jar!/org/clulab/wm/eidos/english/grammars/entities/grammar/entities.yml
15:26:54.603 [main] INFO org.clulab.wm.eidos.utils.Sourcer$ - Sourcing resource file:/Users/ben/tmp/eidos/target/scala-2.12/eidos-assembly-0.2.3-SNAPSHOT.jar!/org/clulab/wm/eidos/english/grammars/avoidLocal.yml
15:26:54.633 [main] INFO o.c.wm.eidos.context.GeoNormFinder$ - GeoNames index found at ./cache/geonames/index.
*** jnius.JavaException: JVM exception occurred: Could not initialize class org.apache.lucene.codecs.Codec$Holder
@kwalcock do you know what this could be? I'll start looking into it too.
Huh, interestingly, when I ran it again, I got yet another error that I haven't seen before:
...
15:32:50.533 [main] INFO org.clulab.wm.eidos.utils.Sourcer$ - Sourcing resource file:/Users/ben/tmp/eidos/target/scala-2.12/eidos-assembly-0.2.3-SNAPSHOT.jar!/org/clulab/wm/eidos/english/grammars/entities/grammar/entities.yml
15:32:51.634 [main] INFO org.clulab.wm.eidos.utils.Sourcer$ - Sourcing resource file:/Users/ben/tmp/eidos/target/scala-2.12/eidos-assembly-0.2.3-SNAPSHOT.jar!/org/clulab/wm/eidos/english/grammars/avoidLocal.yml
15:32:51.708 [main] INFO o.c.wm.eidos.context.GeoNormFinder$ - GeoNames index found at ./cache/geonames/index.
2019-08-13 15:32:53.636658: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.2 AVX AVX2 FMA
I see Lucene in there. The subprojects of eidos were recently changed around so that the version configured in build.sbt, Lucene 6.6.6, is used for eidos rather than 7.0.0 which was indirectly required by an elasticsearch dependency. Could that be the problem?
Is it the comment from tensorflow that is worrisome? I think that it is just pointing out that if it had been compiled to support the special CPU instructions it could be more efficient. [A] generically compiled version is being used for compatability. These parts of the program have indeed changed recently. I don't think it is an error.
Yes, although this Tensorflow message is the last thing I see printed before getting
JavaException: JVM exception occurred: java.lang.ExceptionInInitializerError
But it is possible that some further messages are just not flushed to stdout before the constructor crashes.
Did the Lucene error magically disappear? I'm tempted to add some printlns to see how far it got. Do you happen to know if the present master version works? That and kwalcock-emptyConstructor should only have those two constructor lines different. Has
es = eidos(autoclass('java.lang.Object')())
been changed to something more like
es = eidos()
Sorry it isn't working right away.
Right, I also thought it was probably unrelated so I tried master and got the same error. So that means that it isn't related to the change in the constructor.
The lucene error isn't showing up for me anymore but I still get the same crash. I suspected the disappearance of the error message had something to do with the geonames cache so I did rm -r cache/geonames/index
but it didn't seem to make a difference. Should I open another issue on clulab/eidos to discuss this?
Sure, that's starting to seem like the a better place. If the geonorm files are incorrect, I would expect a nice stack trace somewhere. Master would require the constructor with the Object again. The neural network library being used has changed recently, but I haven't noticed compatability problems, and that's more likely to crash the VM in low level code. Is there any chance in the world that I can reproduce what you're seeing? I'm tagging @bethard because he is much more familiar with failure modes of TensorFlow. Do you have an idea when master last worked? Thanks!
I can confirm that the tensorflow warning is not something that should cause any errors. We get that warning all the time, but as Keith says, it's just telling you that you could get better performance by recompiling tensorflow on your machine.
I've never seen an ExceptionInInitializerError. Is there any way of getting the full stack trace of the ExceptionInInitializerError? Usually that would tell us what class was being initialized when the exception was thrown.
Okay, thanks! There isn't really a way to get the stack trace due to the Java-Python bridge being used here. Keith, if you'd like to reproduce the issue, the following would be the easiest:
docker run -it --entrypoint /bin/bash labsyspharm/indra
to get a fully configured INDRA environment, then use docker cp
to put an Eidos JAR into the running container (or launch the container with additional arguments to mount one of your local folders into the container to be able to access the JAR file that way). Then set EIDOSPATH and CLASSPATH in the container to point to the JAR file, and run
from indra.sources import eidos
eidos.process_text('floods cause displacement')
in a Python shell. This, I think is easier than setting up INDRA from scratch on your computer if you haven't done so previously.
I will now try to do binary search on Eidos commits to identify where this error appeared and will report back on what I find.
Thanks for the search effort and quick tutorial. I'm working on it as well.
I do get the same error, so that much is good.
It looks like the problem is in the constructor argument for the GeoNamesIndex. It is getting a relative path like ./cache/geonames/index which hopefully just needs to be turned into an absolute path. The relative one seems not to work well in this environment. I'll fix it in Eidos then and see if it helps.
Ah okay, I narrowed it down to this commit https://github.com/clulab/eidos/commit/5ca5a70547c029702c211dd561c30966be8910c7 but wasn't sure about the exact reason.
FWIW that wasn't the problem. Digging deeper....
I'll write this down in case it looks familiar to anybody. The problem seems to be in the constructor of the org.apache.lucene.index.DirectoryReader
. It complains
java.lang.ExceptionInInitializerError
at org.apache.lucene.codecs.Codec.forName(Codec.java:116)
at org.apache.lucene.index.SegmentInfos.readCodec(SegmentInfos.java:424)
at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:356)
at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:288)
at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:57)
at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:54)
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:692)
at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:77)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63)
at org.clulab.wm.eidos.context.GeoNormFinder$.fromConfig(GeoNormFinder.scala:98)
at org.clulab.wm.eidos.extraction.Finder$.$anonfun$fromConfig$1(Finder.scala:21)
at scala.collection.immutable.List.map(List.scala:287)
at org.clulab.wm.eidos.extraction.Finder$.fromConfig(Finder.scala:17)
at org.clulab.wm.eidos.EidosSystem$LoadableAttributes$.apply(EidosSystem.scala:95)
at org.clulab.wm.eidos.EidosSystem.<init>(EidosSystem.scala:122)
at org.clulab.wm.eidos.EidosSystem.<init>(EidosSystem.scala:26)
Caused by: java.lang.IllegalArgumentException: An SPI class of type org.apache.lucene.codecs.Codec with name 'Lucene62' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath. The current classpath supports the following names: []
at org.apache.lucene.util.NamedSPILoader.lookup(NamedSPILoader.java:116)
at org.apache.lucene.codecs.Codec$Holder.<clinit>(Codec.java:60)
... 16 more
I added
"org.apache.lucene" % "lucene-codecs" % luceneVer
to build.sbt and verified that the class file /org/apache/lucene/codecs/lucene62/Lucene62Codec.class is part of the assembled file. The error message did not change.
I copied lucene-codecs-6.6.6.jar to the docker container and adjusted the CLASSPATH to include this jar file. The error message improved to
You need to add the corresponding JAR file supporting this SPI to your classpath. The current classpath supports the following names: [SimpleText]
I wonder whether the assembled file is really correct, because I don't see anything in META-INF that sounds like this:
Lucene 4 supports now so called "Codecs" which are implementation classes for custom index encodings. E.g. you can exchange the default postings list by a custom format. Those implementations are classes that extend an abstract base class/interface to implement this functionality. To plug this into the Lucene index format, IndexReader and IndexWriter need the knowledge, which codec type was used to load the correct classes. This is done by the Java Service Provider interface (SPI), which is used quite often e.g. to load custom image formats, XML parsers, Charsets,... in the JDK. The codec implementation classes are listed in the JAR files META-INF section and then loaded by the code and register themselves through the codec manager. There is no custom class loading involved, it's all just plain Java. E.g. Charset.forName("ISO-8859-1") in the JDK does exactly the same: It scans META-INF for Charset files, loads all charsets and looks them up by name. You can use this to e.g. plug in ICU4J to have more charsets than in the default JDK.
I've never seen this before, but it seems like there might be a problem in how the JAR file is being assembled? https://stackoverflow.com/questions/45615385/lucene-6-error-cannot-instantiate-spi-class-in-jar-file This answer also suggests that there's supposed to be something in META-INF: https://stackoverflow.com/questions/29818169/what-causes-err-a-spi-class-of-type-lucene-codecs-codec-name-lucene42
Right. Perhaps assembly is not working as intended and we seldom use it so that the problem has not yet been encountered.
Android only packs directly referenced classes into the DEX file (similar to the Maven Shade plugin or the Jarjar tool). The codec classes are not referenced my any internal Lucene class (only loaded via SPI) so it might happen that the DEX tool was missing to pack them into the resulting package.
Looks like Eidos's build.sbt overrides the merge strategy in assembly to be much less careful than the default merge strategy. Could that be the problem?
Thanks. I can now get the same error running on plain Windows with a Scala program, no Docker or Python needed. I think that the assembly program is not clever enough to even know that the Charsets thing is needed, but I have still to find the correct values to include in the jar. Getting close, though.
It's working now. The Eidos PR has been updated. Two lines need to be changed in Indra's reader.py:
self.eidos_reader = eidos(autoclass('java.lang.Object')())
needs to be
self.eidos_reader = eidos()
and there's one I forgot about. The call to extractFromText no longer has the keep text argument, because it has to always be true now. So
True, # keep text
changes to
# True, # keep text
or something similar.
In lucene-analyzers-common-6.6.6.jar there are also three files in META-INF/services with names from package org.apache.lucene.analysis.util: CharFilterFactory, TokenFilterFactory, and TokenizerFactory. Nothing has complained about them not being in the assembly, but whether they get used may depend on runtime circumstances. @bethard, do you know whether eidos needs them?
I don't know if/how they're used, but why aren't we doing what the standard merge strategy does and including all META-INF/services? That is, why are we cherry-picking just a few services?
I don't know. This is inherited code for me. Presumably there was some conflict. A more general solution would be nice.
It looks like build.sbt started out with
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case x => MergeStrategy.first
courtesy of @bgyori and then @EgoLaparra added individual exceptions for
case "META-INF/services/org.nd4j.linalg.factory.Nd4jBackend" => MergeStrategy.first
case "META-INF/services/org.nd4j.linalg.compression.NDArrayCompressor" => MergeStrategy.first
because the discard line was probably discarding too much. Before trying to change it, I'd like to get the back story, particularly about the discard line and what all it's trying to avoid. Thanks!
We are not using deeplearning4j anymore, we could remove those two exceptions.
As for the first piece of code, I added it very early on in the project just to make sbt assembly possible, I probably copied it from somewhere as a standard boilerplate. (see https://github.com/clulab/eidos/commit/4f8982c7e512fa14a04fb44f731cae3af2627ec3).
Thanks everyone. Right now I'm testing the code below, which is better off for everyones input.
assemblyMergeStrategy in assembly := {
// See https://github.com/sbt/sbt-assembly.
case PathList("META-INF", "MANIFEST.MF") => MergeStrategy.discard // We'll make a new manifest for Eidos.
case PathList("META-INF", "DEPENDENCIES") => MergeStrategy.discard // All dependencies will be included in the assembly already.
case PathList("META-INF", "LICENSE") => MergeStrategy.concat // Concatenate everyones licenses and notices.
case PathList("META-INF", "LICENSE.txt") => MergeStrategy.concat
case PathList("META-INF", "NOTICE") => MergeStrategy.concat
case PathList("META-INF", "NOTICE.txt") => MergeStrategy.concat
// These all have different contents and cannot be automatically deduplicated.
case PathList("META-INF", "services", "org.apache.lucene.codecs.PostingsFormat") => MergeStrategy.filterDistinctLines
case PathList("META-INF", "services", "com.fasterxml.jackson.databind.Module") => MergeStrategy.filterDistinctLines
case PathList("META-INF", "services", "javax.xml.transform.TransformerFactory") => MergeStrategy.first // or last
case PathList("reference.conf") => MergeStrategy.concat // Scala configuration files -- important!
// Otherwise just keep one copy if the contents are the same and complain if not.
case x => MergeStrategy.deduplicate
}
This code seems to work, so I'll merge it shortly.
Great, thanks! I actually put an integration test in place that builds the Eidos JAR each day and tests INDRA against it as part of our Travis tests, precisely to catch these kinds of issues quickly. However, it turns out that the script responsible for the daily build hasn't been operational for a while, therefore we didn't catch this issue earlier. I fixed the script though, so it should be better in the future.
Cool! Now that you showed me how to use Indra from the docker file, I should be able to send you a PR when we notice that an Eidos change will affect you.
You may want to use Assembly.isLicenseFile
instead of hard-coding LICENSE and NOTICE. If you take a look at https://github.com/sbt/sbt-assembly#merge-strategy, under defaultMergeStrategy
, you can see the recommended way of doing this. I do wonder though why we can't just use defaultMergeStrategy
. Isn't that just a more robust version of what you've written?
Indeed I should have noticed that isLicenseFile. For others I would just now rather be pedantic about it and know all the decisions that are being made. Falling through to default rather than deduplicate will swallow new services without alerting me to potential problems.
Alright, thanks! But the saga doesn't end... Now that the JAR and the constructor work, the next error I bumped into is this:
json_dict = eidos_reader.process_text(text, out_format)
...
jnius.JavaException: Invalid call, number of argument mismatch
I'll change this call to match the current expectations.
Oh I see your comment above about removing one of the arguments of extractFromText - testing it now.
Alright, it works! So I'll merge #942 here and you can merge https://github.com/clulab/eidos/pull/646.
Done. Again, it should be easier the next time. Thanks for your patience.
I see in the Indra code that someone is using
to construct an EidosSystem.
This seems to make use of the EidosSystem special constructor
that might have been added becuase other languages cannot deal with the default value specified in the main constructor.
That extra constructor with the Object argument is starting to interfere with the internal calls in Eidos and it probably isn't even necessary. I believe but am not sure that it could have been expressed previously as
and not required the special argument in Python at all. I've made a branch of Eidos,
kwalcock-emptyConstructor
that can be used to test this. Can someone there hook INDRA up to it to see if it works?If it does, I'll merge that change and then afterwards move to more complicated constructors that account for the options we're needing to add, but keep that empty constructor rather than the Object one. Thanks!