sorgerlab / indra

INDRA (Integrated Network and Dynamical Reasoning Assembler) is an automated model assembly system interfacing with NLP systems and databases to collect knowledge, and through a process of assembly, produce causal graphs and dynamical models.
http://indra.bio
BSD 2-Clause "Simplified" License
176 stars 66 forks source link

Constructor change for Eidos #941

Closed kwalcock closed 5 years ago

kwalcock commented 5 years ago

I see in the Indra code that someone is using

eidos = autoclass(eidos_package + '.EidosSystem')
es = eidos(autoclass('java.lang.Object')())

to construct an EidosSystem.

This seems to make use of the EidosSystem special constructor

def this(x: Object) = this() // Dummy constructor crucial for Python integration

that might have been added becuase other languages cannot deal with the default value specified in the main constructor.

class EidosSystem(val config: Config = EidosSystem.defaultConfig) {

That extra constructor with the Object argument is starting to interfere with the internal calls in Eidos and it probably isn't even necessary. I believe but am not sure that it could have been expressed previously as

class EidosSystem(val config: Config) {
  def this() = this(EidosSystem.defaultConfig)

and not required the special argument in Python at all. I've made a branch of Eidos, kwalcock-emptyConstructor that can be used to test this. Can someone there hook INDRA up to it to see if it works?

If it does, I'll merge that change and then afterwards move to more complicated constructors that account for the options we're needing to add, but keep that empty constructor rather than the Object one. Thanks!

bgyori commented 5 years ago

Thanks, will check today.

bgyori commented 5 years ago

I am testing this now: I created a branch where the constructor is called without arguments. However, I ran into another error - probably unrelated - that I saw for the first time:

...
15:26:54.585 [main] INFO  org.clulab.wm.eidos.utils.Sourcer$ - Sourcing resource file:/Users/ben/tmp/eidos/target/scala-2.12/eidos-assembly-0.2.3-SNAPSHOT.jar!/org/clulab/wm/eidos/english/grammars/entities/grammar/entities.yml
15:26:54.603 [main] INFO  org.clulab.wm.eidos.utils.Sourcer$ - Sourcing resource file:/Users/ben/tmp/eidos/target/scala-2.12/eidos-assembly-0.2.3-SNAPSHOT.jar!/org/clulab/wm/eidos/english/grammars/avoidLocal.yml
15:26:54.633 [main] INFO  o.c.wm.eidos.context.GeoNormFinder$ - GeoNames index found at ./cache/geonames/index.
*** jnius.JavaException: JVM exception occurred: Could not initialize class org.apache.lucene.codecs.Codec$Holder

@kwalcock do you know what this could be? I'll start looking into it too.

bgyori commented 5 years ago

Huh, interestingly, when I ran it again, I got yet another error that I haven't seen before:

...
15:32:50.533 [main] INFO  org.clulab.wm.eidos.utils.Sourcer$ - Sourcing resource file:/Users/ben/tmp/eidos/target/scala-2.12/eidos-assembly-0.2.3-SNAPSHOT.jar!/org/clulab/wm/eidos/english/grammars/entities/grammar/entities.yml
15:32:51.634 [main] INFO  org.clulab.wm.eidos.utils.Sourcer$ - Sourcing resource file:/Users/ben/tmp/eidos/target/scala-2.12/eidos-assembly-0.2.3-SNAPSHOT.jar!/org/clulab/wm/eidos/english/grammars/avoidLocal.yml
15:32:51.708 [main] INFO  o.c.wm.eidos.context.GeoNormFinder$ - GeoNames index found at ./cache/geonames/index.
2019-08-13 15:32:53.636658: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.2 AVX AVX2 FMA
kwalcock commented 5 years ago

I see Lucene in there. The subprojects of eidos were recently changed around so that the version configured in build.sbt, Lucene 6.6.6, is used for eidos rather than 7.0.0 which was indirectly required by an elasticsearch dependency. Could that be the problem?

kwalcock commented 5 years ago

Is it the comment from tensorflow that is worrisome? I think that it is just pointing out that if it had been compiled to support the special CPU instructions it could be more efficient. [A] generically compiled version is being used for compatability. These parts of the program have indeed changed recently. I don't think it is an error.

bgyori commented 5 years ago

Yes, although this Tensorflow message is the last thing I see printed before getting

JavaException: JVM exception occurred: java.lang.ExceptionInInitializerError

But it is possible that some further messages are just not flushed to stdout before the constructor crashes.

kwalcock commented 5 years ago

Did the Lucene error magically disappear? I'm tempted to add some printlns to see how far it got. Do you happen to know if the present master version works? That and kwalcock-emptyConstructor should only have those two constructor lines different. Has es = eidos(autoclass('java.lang.Object')()) been changed to something more like es = eidos() Sorry it isn't working right away.

bgyori commented 5 years ago

Right, I also thought it was probably unrelated so I tried master and got the same error. So that means that it isn't related to the change in the constructor.

The lucene error isn't showing up for me anymore but I still get the same crash. I suspected the disappearance of the error message had something to do with the geonames cache so I did rm -r cache/geonames/index but it didn't seem to make a difference. Should I open another issue on clulab/eidos to discuss this?

kwalcock commented 5 years ago

Sure, that's starting to seem like the a better place. If the geonorm files are incorrect, I would expect a nice stack trace somewhere. Master would require the constructor with the Object again. The neural network library being used has changed recently, but I haven't noticed compatability problems, and that's more likely to crash the VM in low level code. Is there any chance in the world that I can reproduce what you're seeing? I'm tagging @bethard because he is much more familiar with failure modes of TensorFlow. Do you have an idea when master last worked? Thanks!

bethard commented 5 years ago

I can confirm that the tensorflow warning is not something that should cause any errors. We get that warning all the time, but as Keith says, it's just telling you that you could get better performance by recompiling tensorflow on your machine.

I've never seen an ExceptionInInitializerError. Is there any way of getting the full stack trace of the ExceptionInInitializerError? Usually that would tell us what class was being initialized when the exception was thrown.

bgyori commented 5 years ago

Okay, thanks! There isn't really a way to get the stack trace due to the Java-Python bridge being used here. Keith, if you'd like to reproduce the issue, the following would be the easiest:

docker run -it --entrypoint /bin/bash labsyspharm/indra

to get a fully configured INDRA environment, then use docker cp to put an Eidos JAR into the running container (or launch the container with additional arguments to mount one of your local folders into the container to be able to access the JAR file that way). Then set EIDOSPATH and CLASSPATH in the container to point to the JAR file, and run

from indra.sources import eidos
eidos.process_text('floods cause displacement')

in a Python shell. This, I think is easier than setting up INDRA from scratch on your computer if you haven't done so previously.

I will now try to do binary search on Eidos commits to identify where this error appeared and will report back on what I find.

kwalcock commented 5 years ago

Thanks for the search effort and quick tutorial. I'm working on it as well.

kwalcock commented 5 years ago

I do get the same error, so that much is good.

kwalcock commented 5 years ago

It looks like the problem is in the constructor argument for the GeoNamesIndex. It is getting a relative path like ./cache/geonames/index which hopefully just needs to be turned into an absolute path. The relative one seems not to work well in this environment. I'll fix it in Eidos then and see if it helps.

bgyori commented 5 years ago

Ah okay, I narrowed it down to this commit https://github.com/clulab/eidos/commit/5ca5a70547c029702c211dd561c30966be8910c7 but wasn't sure about the exact reason.

kwalcock commented 5 years ago

FWIW that wasn't the problem. Digging deeper....

kwalcock commented 5 years ago

I'll write this down in case it looks familiar to anybody. The problem seems to be in the constructor of the org.apache.lucene.index.DirectoryReader. It complains

java.lang.ExceptionInInitializerError
    at org.apache.lucene.codecs.Codec.forName(Codec.java:116)
    at org.apache.lucene.index.SegmentInfos.readCodec(SegmentInfos.java:424)
    at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:356)
    at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:288)
    at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:57)
    at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:54)
    at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:692)
    at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:77)
    at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63)
    at org.clulab.wm.eidos.context.GeoNormFinder$.fromConfig(GeoNormFinder.scala:98)
    at org.clulab.wm.eidos.extraction.Finder$.$anonfun$fromConfig$1(Finder.scala:21)
    at scala.collection.immutable.List.map(List.scala:287)
    at org.clulab.wm.eidos.extraction.Finder$.fromConfig(Finder.scala:17)
    at org.clulab.wm.eidos.EidosSystem$LoadableAttributes$.apply(EidosSystem.scala:95)
    at org.clulab.wm.eidos.EidosSystem.<init>(EidosSystem.scala:122)
    at org.clulab.wm.eidos.EidosSystem.<init>(EidosSystem.scala:26)
Caused by: java.lang.IllegalArgumentException: An SPI class of type org.apache.lucene.codecs.Codec with name 'Lucene62' does not exist.  You need to add the corresponding JAR file supporting this SPI to your classpath.  The current classpath supports the following names: []
    at org.apache.lucene.util.NamedSPILoader.lookup(NamedSPILoader.java:116)
    at org.apache.lucene.codecs.Codec$Holder.<clinit>(Codec.java:60)
    ... 16 more

I added

"org.apache.lucene" % "lucene-codecs"           % luceneVer

to build.sbt and verified that the class file /org/apache/lucene/codecs/lucene62/Lucene62Codec.class is part of the assembled file. The error message did not change.

I copied lucene-codecs-6.6.6.jar to the docker container and adjusted the CLASSPATH to include this jar file. The error message improved to

  You need to add the corresponding JAR file supporting this SPI to your classpath.  The current classpath supports the following names: [SimpleText]

I wonder whether the assembled file is really correct, because I don't see anything in META-INF that sounds like this: Lucene 4 supports now so called "Codecs" which are implementation classes for custom index encodings. E.g. you can exchange the default postings list by a custom format. Those implementations are classes that extend an abstract base class/interface to implement this functionality. To plug this into the Lucene index format, IndexReader and IndexWriter need the knowledge, which codec type was used to load the correct classes. This is done by the Java Service Provider interface (SPI), which is used quite often e.g. to load custom image formats, XML parsers, Charsets,... in the JDK. The codec implementation classes are listed in the JAR files META-INF section and then loaded by the code and register themselves through the codec manager. There is no custom class loading involved, it's all just plain Java. E.g. Charset.forName("ISO-8859-1") in the JDK does exactly the same: It scans META-INF for Charset files, loads all charsets and looks them up by name. You can use this to e.g. plug in ICU4J to have more charsets than in the default JDK.

bethard commented 5 years ago

I've never seen this before, but it seems like there might be a problem in how the JAR file is being assembled? https://stackoverflow.com/questions/45615385/lucene-6-error-cannot-instantiate-spi-class-in-jar-file This answer also suggests that there's supposed to be something in META-INF: https://stackoverflow.com/questions/29818169/what-causes-err-a-spi-class-of-type-lucene-codecs-codec-name-lucene42

kwalcock commented 5 years ago

Right. Perhaps assembly is not working as intended and we seldom use it so that the problem has not yet been encountered.

Android only packs directly referenced classes into the DEX file (similar to the Maven Shade plugin or the Jarjar tool). The codec classes are not referenced my any internal Lucene class (only loaded via SPI) so it might happen that the DEX tool was missing to pack them into the resulting package. 
bethard commented 5 years ago

Looks like Eidos's build.sbt overrides the merge strategy in assembly to be much less careful than the default merge strategy. Could that be the problem?

kwalcock commented 5 years ago

Thanks. I can now get the same error running on plain Windows with a Scala program, no Docker or Python needed. I think that the assembly program is not clever enough to even know that the Charsets thing is needed, but I have still to find the correct values to include in the jar. Getting close, though.

kwalcock commented 5 years ago

It's working now. The Eidos PR has been updated. Two lines need to be changed in Indra's reader.py:

self.eidos_reader = eidos(autoclass('java.lang.Object')())

needs to be

self.eidos_reader = eidos()

and there's one I forgot about. The call to extractFromText no longer has the keep text argument, because it has to always be true now. So

True, # keep text

changes to

# True, # keep text

or something similar.

kwalcock commented 5 years ago

In lucene-analyzers-common-6.6.6.jar there are also three files in META-INF/services with names from package org.apache.lucene.analysis.util: CharFilterFactory, TokenFilterFactory, and TokenizerFactory. Nothing has complained about them not being in the assembly, but whether they get used may depend on runtime circumstances. @bethard, do you know whether eidos needs them?

bethard commented 5 years ago

I don't know if/how they're used, but why aren't we doing what the standard merge strategy does and including all META-INF/services? That is, why are we cherry-picking just a few services?

kwalcock commented 5 years ago

I don't know. This is inherited code for me. Presumably there was some conflict. A more general solution would be nice.

kwalcock commented 5 years ago

It looks like build.sbt started out with

    case PathList("META-INF", xs @ _*) => MergeStrategy.discard
    case x => MergeStrategy.first

courtesy of @bgyori and then @EgoLaparra added individual exceptions for

    case "META-INF/services/org.nd4j.linalg.factory.Nd4jBackend" => MergeStrategy.first
    case "META-INF/services/org.nd4j.linalg.compression.NDArrayCompressor" => MergeStrategy.first

because the discard line was probably discarding too much. Before trying to change it, I'd like to get the back story, particularly about the discard line and what all it's trying to avoid. Thanks!

EgoLaparra commented 5 years ago

We are not using deeplearning4j anymore, we could remove those two exceptions.

bgyori commented 5 years ago

As for the first piece of code, I added it very early on in the project just to make sbt assembly possible, I probably copied it from somewhere as a standard boilerplate. (see https://github.com/clulab/eidos/commit/4f8982c7e512fa14a04fb44f731cae3af2627ec3).

kwalcock commented 5 years ago

Thanks everyone. Right now I'm testing the code below, which is better off for everyones input.

assemblyMergeStrategy in assembly := {     
    // See https://github.com/sbt/sbt-assembly.
    case PathList("META-INF", "MANIFEST.MF")  => MergeStrategy.discard // We'll make a new manifest for Eidos.
    case PathList("META-INF", "DEPENDENCIES") => MergeStrategy.discard // All dependencies will be included in the assembly already.
    case PathList("META-INF", "LICENSE")      => MergeStrategy.concat  // Concatenate everyones licenses and notices.
    case PathList("META-INF", "LICENSE.txt")  => MergeStrategy.concat
    case PathList("META-INF", "NOTICE")       => MergeStrategy.concat
    case PathList("META-INF", "NOTICE.txt")   => MergeStrategy.concat
    // These all have different contents and cannot be automatically deduplicated.
    case PathList("META-INF", "services", "org.apache.lucene.codecs.PostingsFormat") => MergeStrategy.filterDistinctLines 
    case PathList("META-INF", "services", "com.fasterxml.jackson.databind.Module")   => MergeStrategy.filterDistinctLines 
    case PathList("META-INF", "services", "javax.xml.transform.TransformerFactory")  => MergeStrategy.first // or last
    case PathList("reference.conf") => MergeStrategy.concat // Scala configuration files -- important!
    // Otherwise just keep one copy if the contents are the same and complain if not.
    case x => MergeStrategy.deduplicate
}
kwalcock commented 5 years ago

This code seems to work, so I'll merge it shortly.

bgyori commented 5 years ago

Great, thanks! I actually put an integration test in place that builds the Eidos JAR each day and tests INDRA against it as part of our Travis tests, precisely to catch these kinds of issues quickly. However, it turns out that the script responsible for the daily build hasn't been operational for a while, therefore we didn't catch this issue earlier. I fixed the script though, so it should be better in the future.

kwalcock commented 5 years ago

Cool! Now that you showed me how to use Indra from the docker file, I should be able to send you a PR when we notice that an Eidos change will affect you.

bethard commented 5 years ago

You may want to use Assembly.isLicenseFile instead of hard-coding LICENSE and NOTICE. If you take a look at https://github.com/sbt/sbt-assembly#merge-strategy, under defaultMergeStrategy, you can see the recommended way of doing this. I do wonder though why we can't just use defaultMergeStrategy. Isn't that just a more robust version of what you've written?

kwalcock commented 5 years ago

Indeed I should have noticed that isLicenseFile. For others I would just now rather be pedantic about it and know all the decisions that are being made. Falling through to default rather than deduplicate will swallow new services without alerting me to potential problems.

bgyori commented 5 years ago

Alright, thanks! But the saga doesn't end... Now that the JAR and the constructor work, the next error I bumped into is this:

json_dict = eidos_reader.process_text(text, out_format)
...
jnius.JavaException: Invalid call, number of argument mismatch

I'll change this call to match the current expectations.

bgyori commented 5 years ago

Oh I see your comment above about removing one of the arguments of extractFromText - testing it now.

bgyori commented 5 years ago

Alright, it works! So I'll merge #942 here and you can merge https://github.com/clulab/eidos/pull/646.

kwalcock commented 5 years ago

Done. Again, it should be easier the next time. Thanks for your patience.