Closed tnine closed 13 years ago
Hey Jake, I've investigated this further, and I have determined the issue. The LuceneTermEnum does not properly match the spec when enumerating numeric trie terms. I've added some debug output when using the default RamDirectory on version 2.9.3 and running the TestNumericRangeQuery32 tests. I receive this enumeration order when the "term()" method is invoked on their SegmentTermEnum class.
Returning term for field 'field8' hex value is : 60077f7e6814 Returning term for field 'field8' hex value is : 60077f7e6814 Returning term for field 'field8' hex value is : 60077f7e6814 Returning term for field 'field8' hex value is : 60077f7e6814 Returning term for field 'field8' hex value is : 68037f7f00 Returning term for field 'field8' hex value is : 68037f7f00 Returning term for field 'field8' hex value is : 68037f7f00 Returning term for field 'field8' hex value is : 68037f7f00 Returning term for field 'field8' hex value is : 68037f7f34 Returning term for field 'field8' hex value is : 68037f7f34 Returning term for field 'field8' hex value is : 68037f7f34 Returning term for field 'field8' hex value is : 68037f7f34 Returning term for field 'field8' hex value is : 68037f7f34 Returning term for field 'field8' hex value is : 68037f7f34 Returning term for field 'field8' hex value is : 68037f7f4e Returning term for field 'field8' hex value is : 68037f7f34 Returning term for field 'field8' hex value is : 68037f7f4e Returning term for field 'field8' hex value is : 68037f7f4e Returning term for field 'field8' hex value is : 68037f7f68 Returning term for field 'field8' hex value is : 68037f7f4e Returning term for field 'field8' hex value is : 68037f7f68 Returning term for field 'field8' hex value is : 68037f7f68 Returning term for field 'field8' hex value is : 6804000002 Returning term for field 'field8' hex value is : 68037f7f68 Returning term for field 'field8' hex value is : 70017f7f Returning term for field 'field8' hex value is : 70017f7f Returning term for field 'field8' hex value is : 70017f7f Returning term for field 'field8' hex value is : 70017f7f Returning term for field 'field8' hex value is : 78007f Returning term for field 'field8' hex value is : 78007f Returning term for field 'field8' hex value is : 78007f Returning term for field 'field8' hex value is : 78007f Returning term for field 'field8' hex value is : 780100 Returning term for field 'field8' hex value is : 780100 Returning term for field 'field8' hex value is : 780100 Returning term for field 'field8' hex value is : 780100 Returning term for field 'field8' hex value is : 780100 Returning term for field 'field8' hex value is : 780100
These are the results with LucandraTermEnum
Returning term for field 'field8' hex value is : 60077f7e6814 Returning term for field 'field8' hex value is : 600809433244 Returning term for field 'field8' hex value is : 68037f7f34 Returning term for field 'field8' hex value is : 68037f7f34 Returning term for field 'field8' hex value is : 68037f7f4e Returning term for field 'field8' hex value is : 68037f7f4e Returning term for field 'field8' hex value is : 68037f7f68 Returning term for field 'field8' hex value is : 68037f7f68 Returning term for field 'field8' hex value is : 6804000002 Returning term for field 'field8' hex value is : 6804046008 Returning term for field 'field8' hex value is : 6804046008
As you can see the results are not properly enumerated. Given that you're using a Tree for the cached terms, they should be ordered properly after insert. It seems that this may be an issue with the way loadTerms is invoked
Hi Jake, I've been digging into this one all day. After searching a bit more, I found an issue in my local copy of the TermEnum which I have corrected. This resolves the enumeration issue I described above. However, the documents are not returned in "default" order. I.E. the order they were added to the index as the test expects. Im assuming this is a bug in the LucandraTermDocs, but I'm having a hard time locating it. Thoughts?
I've updated my test case on my fork that shows the issue.
http://github.com/tnine/Lucandra/blob/master/test/lucandra/NumericRangeTests.java
It appears to still be term enum related. The calls to IndexReader.addDocument are occurring in a different order than the insertion.
fixed.
Hi Jake, Take a look at my fork, I've added tests from Uwe's numeric tests on the lucene core. Only a handful of tests appear to be working. I'll be correcting this in my fork and I'll let you know when I'm done.