Open mikemccand opened 6 years ago
Here's a patch with a failing test case; it creates a Field
with an integer value and with TextField.TYPE_STORED
field type.
The test fails on the assertEquals
because the token "17" was in fact created; this happens because Field.stringValue
calls .toString
on numeric values ... I think this is somewhat dangerously lenient but maybe it's not often hit because nobody would normally try to analyze an int?
[Legacy Jira: Michael McCandless (@mikemccand) on Dec 26 2017]
+1 to be less lenient. If someone needs to do something like this, I'd rather like the toString conversion to be performed by the user before creating the Field instance.
[Legacy Jira: Adrien Grand (@jpountz) on Dec 28 2017]
I stumbled on this by accident, by creating a
Field
instance with aInteger
value for itsfieldsData
and then settingtokenized = true
in itsFieldType
.If you do this then Lucene silently converts the int to a string and then tokenizes it, e.g. applying synonyms, etc., if that's what your analysis chain does.
Legacy Jira details
LUCENE-8108 by Michael McCandless (@mikemccand) on Dec 26 2017, updated Dec 28 2017 Attachments: LUCENE-8108.patch