jpype-project / jpype

JPype is cross language bridge to allow Python programs full access to Java class libraries.
http://www.jpype.org
Apache License 2.0
1.12k stars 181 forks source link

Cannot import some Lucene classes using OpenJDK 11 #838

Closed afalquina closed 3 years ago

afalquina commented 4 years ago

I am using Lucene 8.6.0 on a project of mine. I am using Python 3.8.2 on Pop! OS (Ubuntu) and Python 3.8.5 on RHEL 7.8.

The following code fails on OpenJDK 11 and OpenJDK 14 but works just fine on OpenJDK 8:

$ export CLASSPATH=lib/lucene-core-8.6.0.jar
$ python
Python 3.8.2 (default, Jul 16 2020, 14:00:26) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import jpype
>>> import jpype.imports
>>> print(jpype.getDefaultJVMPath())
/usr/lib/jvm/java-14-openjdk-amd64/lib/server/libjvm.so
>>> jpype.startJVM(jpype.getDefaultJVMPath())
>>> from org.apache.lucene.search import BooleanClause
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 652, in _load_unlocked
AttributeError: type object 'org.apache.lucene.search.BooleanClause' has no attribute 'loader'

Am i doing something wrong?

Thrameos commented 4 years ago

I see nothing obviously wrong with the line. The thing is that error trace shows it is not in JPype code by in Python bootloader. So my guess is that you have something interfering with the loading process (such as a directory or module named "org" in the Python path).

I would proceed by using JClass to perform the class load instead of the import. If it works then the issue is something in the Python loading system. I would start that debugging by just importing "org" and see if it has a "file" attribute so that I can see where it is coming from. Repeat the process for org.apache and so forth. You can can also add a few "print" statements to the jpype/imports.py to figure out the difference in the path that was taken up to that import statement.

afalquina commented 4 years ago

I'll try that. What baffles me, though, is that it works with OpenJDK 8. Just changing to OpenJDK 14 triggers the error.

afalquina commented 4 years ago

OK. I have tried the following. First with OpenJDK 14:

$ python
Python 3.8.2 (default, Jul 16 2020, 14:00:26) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import jpype
>>> import jpype.imports
>>> jpype.startJVM(jpype.getDefaultJVMPath())
>>> import org
>>> dir(org)
['apache', 'graalvm', 'ietf', 'jcp', 'w3c', 'xml']
>>> import org.apache
>>> dir(org.apache)
['lucene']
>>> import org.apache.lucene
>>> dir(org.apache.lucene)
['analysis', 'codecs', 'document', 'index', 'search', 'store', 'util']
>>> import org.apache.lucene.search
>>> dir(org.apache.lucene.search)
['TopFieldCollector']

And then with OpenJDK 8:

$ python                                
Python 3.8.2 (default, Jul 16 2020, 14:00:26) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import jpype
>>> import jpype.imports
>>> jpype.startJVM(jpype.getDefaultJVMPath())
>>> import org
>>> dir(org)
['apache', 'classpath', 'ietf', 'jcp', 'omg', 'w3c', 'xml']
>>> import org.apache
>>> dir(org.apache)
['lucene']
>>> import org.apache.lucene
>>> dir(org.apache.lucene)
['LucenePackage', 'analysis', 'codecs', 'document', 'geo', 'index', 'search', 'store', 'util']
>>> import org.apache.lucene.search
>>> dir(org.apache.lucene.search)
['AutomatonQuery', 'BlendedTermQuery', 'BlockMaxDISI', 'BooleanClause', 'BooleanQuery', 'BoostAttribute', 'BoostAttributeImpl', 'BoostQuery', 'BulkScorer', 'CachingCollector', 'CollectionStatistics', 'CollectionTerminatedException', 'Collector', 'CollectorManager', 'ConjunctionDISI', 'ConstantScoreQuery', 'ConstantScoreScorer', 'ConstantScoreWeight', 'ControlledRealTimeReopenThread', 'DisiPriorityQueue', 'DisiWrapper', 'DisjunctionDISIApproximation', 'DisjunctionMaxQuery', 'DocIdSet', 'DocIdSetIterator', 'DocValuesFieldExistsQuery', 'DocValuesRewriteMethod', 'DoubleValues', 'DoubleValuesSource', 'Explanation', 'FieldComparator', 'FieldComparatorSource', 'FieldDoc', 'FieldValueHitQueue', 'FilterCollector', 'FilterLeafCollector', 'FilterMatchesIterator', 'FilterScorable', 'FilterScorer', 'FilterWeight', 'FilteredDocIdSetIterator', 'FuzzyQuery', 'FuzzyTermsEnum', 'ImpactsDISI', 'IndexOrDocValuesQuery', 'IndexSearcher', 'LRUQueryCache', 'LeafCollector', 'LeafFieldComparator', 'LeafSimScorer', 'LiveFieldValues', 'LongValues', 'LongValuesSource', 'MatchAllDocsQuery', 'MatchNoDocsQuery', 'Matches', 'MatchesIterator', 'MatchesUtils', 'MaxNonCompetitiveBoostAttribute', 'MaxNonCompetitiveBoostAttributeImpl', 'MultiCollector', 'MultiCollectorManager', 'MultiPhraseQuery', 'MultiTermQuery', 'NGramPhraseQuery', 'NamedMatches', 'NormsFieldExistsQuery', 'PhraseQuery', 'PointInSetQuery', 'PointRangeQuery', 'PositiveScoresOnlyCollector', 'PrefixQuery', 'Query', 'QueryCache', 'QueryCachingPolicy', 'QueryRescorer', 'QueryVisitor', 'ReferenceManager', 'RegexpQuery', 'Rescorer', 'Scorable', 'ScoreCachingWrappingScorer', 'ScoreDoc', 'ScoreMode', 'Scorer', 'ScorerSupplier', 'ScoringRewrite', 'SearcherFactory', 'SearcherLifetimeManager', 'SearcherManager', 'SegmentCacheable', 'SimpleCollector', 'SimpleFieldComparator', 'Sort', 'SortField', 'SortRescorer', 'SortedNumericSelector', 'SortedNumericSortField', 'SortedSetSelector', 'SortedSetSortField', 'SynonymQuery', 'TermInSetQuery', 'TermQuery', 'TermRangeQuery', 'TermStatistics', 'TimeLimitingCollector', 'TopDocs', 'TopDocsCollector', 'TopFieldCollector', 'TopFieldDocs', 'TopScoreDocCollector', 'TopTermsRewrite', 'TotalHitCountCollector', 'TotalHits', 'TwoPhaseIterator', 'UsageTrackingQueryCachingPolicy', 'Weight', 'WildcardQuery', 'similarities', 'spans']

For some reason, the import finds less on the newer JVM.

What can I do to investigate this further?

Thrameos commented 4 years ago

That gives me a very good start. It the org.apache.lucene jar file publicly available (would I be able to replicate this myself)? The problem is likely in org.jpype.pkg.PackageManager which is responsibly for getting the list of packages. It was tested on open JDK from 8 to 11 and has had not issues, but if there was a change in Java or if something is going wrong in the code (exception or the like) then I could see the behavior you describe happen.

The next step for you would be to see if you can load using JClass instead. If you can't do that then the problem could be a Class initializer problem rather then the import system. So knowing which side of the equation to look on will help.

Thrameos commented 4 years ago

I have one other idea. JPype only declares something as viewable if it a public class and it uses the byte code to figure that out. If there is a change in the byte code the my routine does not handle I could see a fail. In the infinite wisdom of the original Jar format you have to part through 100 fields to get the public flag.

afalquina commented 4 years ago

The jar is available here. The file contains several jars. You'll need lucene-core-8.6.0.jar.

Is this what you meant when you said “use JClass“?

$ python                                
Python 3.8.2 (default, Jul 16 2020, 14:00:26) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import jpype
>>> print(jpype.getDefaultJVMPath())
/usr/lib/jvm/java-14-openjdk-amd64/lib/server/libjvm.so
>>> jpype.startJVM(jpype.getDefaultJVMPath())
>>> BooleanClause = jpype.JClass("org.apache.lucene.search.BooleanClause")
>>> BooleanClause
<java class 'org.apache.lucene.search.BooleanClause'>

I am using the same jar on both JVM 8 and JVM 11/14, so I guess that the byte code is always the same. Can the byte code API have changed between JVMs?

Thrameos commented 4 years ago

BooleanClause = jpype.JClass('org.apache.lucene.search.BooleanClause')

There are jars that can have different byte code by jvm if the developers want to have additional features in the jar for later versions. But it is pretty rare.

afalquina commented 4 years ago

Well, the JClass code works on both JVM 8 and JVM 14. At least it does not throw any exceptions…

Enviado desde mi iPhone

El 7 ago 2020, a las 22:19, Karl Nelson notifications@github.com escribió:



BooleanClause = jpype.JClass('org.apache.lucene.search. BooleanClause')

There are jars that can have different byte code by jvm if the developers want to have additional features in the jar for later versions. But it is pretty rare.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/jpype-project/jpype/issues/838#issuecomment-670696791, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AQRGBII4EHTMN455WOGG4QLR7ROWJANCNFSM4PX3F5LA.

Thrameos commented 4 years ago

Still waiting on my development machine. I have not forgotten.

afalquina commented 4 years ago

Thanks! Is there anything I can do on my side?

Thrameos commented 4 years ago

I believe another post on Stackoverflow found this has something to do with the length of the import. So something must be chopping the import string. I will try to run this to ground when I can replicate it.

afalquina commented 4 years ago

Thanks for the update. As always, is there anything I can do to help?

Thrameos commented 3 years ago

I investigated this bug. It doesn't seem very satisfying. The jar file requested is a mult-version jar with both Java 8 and Java 9 layers.

Unfortunately, when I request the directory on Java 9 it is only giving me the contents of the Java 9 layer and not the Java 8 layer. As the class specified only exists in the Java 8 layer the requested class is missing. JPype then tried to throw an exception by calling Java forname. Only when it do so rather than getting an error instead Java is giving the class from the Java 8 layer. This is causing the import system to panic resulting in the incorrect error report.

The bug is not really in JPype as it is calling getResources just as it should to get a directory of the contents. It is the JVM implementation that is incorrectly giving me an empty content. This is similar to the issue with a obfuscated jar where the directories were missing entirely.

So how do we go about addressing this issue? At the time we find the class it is already too late as we were given a chance to produce a member before find_spec was called. Thus the only way to resolve it would be to try a forname when we do a get property and see if that resolves. Unfortunately that will only work if the package structure is only one level deep.

Oddly when I search for "/org/apache" it does the right thing and returns back two directories. So I need to investigate further.

   777 Tue Jul 07 12:46:30 PDT 2020 org/apache/lucene/search/spans/SpanWeight$TermMatch.class
  8993 Tue Jul 07 12:46:30 PDT 2020 org/apache/lucene/search/spans/SpanWeight.class
  2391 Tue Jul 07 12:46:30 PDT 2020 org/apache/lucene/search/spans/SpanWithinQuery$SpanWithinWeight$1.class
  3261 Tue Jul 07 12:46:30 PDT 2020 org/apache/lucene/search/spans/SpanWithinQuery$SpanWithinWeight.class
  3102 Tue Jul 07 12:46:30 PDT 2020 org/apache/lucene/search/spans/SpanWithinQuery.class
  1851 Tue Jul 07 12:46:30 PDT 2020 org/apache/lucene/search/spans/Spans.class
  4004 Tue Jul 07 12:46:30 PDT 2020 org/apache/lucene/search/spans/TermSpans.class
   136 Tue Jul 07 12:46:30 P
DT 2020 org/apache/lucene/search/spans/package-info.class
     0 Tue Jul 07 12:46:32 PDT 2020 META-INF/versions/9/org/apache/lucene/search/
  1455 Tue Jul 07 12:46:32 PDT 2020 META-INF/versions/9/org/apache/lucene/search/BooleanScorer$TailPriorityQueue.class
  3432 Tue Jul 07 12:46:32 PDT 2020 META-INF/versions/9/org/apache/lucene/search/PointInSetQuery$SinglePointVisitor.class
  6931 Tue Jul 07 12:46:32 PDT 2020 META-INF/versions/9/org/apache/lucene/search/PointRangeQuery$1.class
 14775 Tue Jul 07 12:46:32 PDT 2020 META-INF/versions/9/org/apache/lucene/search/TopFieldCollector.class
Thrameos commented 3 years ago

Okay I believe I found a workaround that will fix this behavior on versions going forward. The bug was absolutely obnoxious as there is nothing that would indicate that MRJAR files would do something like this. I looked into this several times but reading the code and doc gave me no clues, but your example eventually lead me to unpack the jar file showing me that the directory entries are being misreported by Java.

Thanks again for the bug report and sorry it took so long to find a resolution.

afalquina commented 3 years ago

Thank you!