mikemccand / stargazers-migration-test

Testing Lucene's Jira -> GitHub issues migration
0 stars 0 forks source link

Field class should not let you analyze int values? [LUCENE-8108] #109

Open mikemccand opened 6 years ago

mikemccand commented 6 years ago

I stumbled on this by accident, by creating a Field instance with a Integer value for its fieldsData and then setting tokenized = true in its FieldType.

If you do this then Lucene silently converts the int to a string and then tokenizes it, e.g. applying synonyms, etc., if that's what your analysis chain does.


Legacy Jira details

LUCENE-8108 by Michael McCandless (@mikemccand) on Dec 26 2017, updated Dec 28 2017 Attachments: LUCENE-8108.patch

mikemccand commented 6 years ago

Here's a patch with a failing test case; it creates a Field with an integer value and with TextField.TYPE_STORED field type.

The test fails on the assertEquals because the token "17" was in fact created; this happens because Field.stringValue calls .toString on numeric values ... I think this is somewhat dangerously lenient but maybe it's not often hit because nobody would normally try to analyze an int?

[Legacy Jira: Michael McCandless (@mikemccand) on Dec 26 2017]

mikemccand commented 6 years ago

+1 to be less lenient. If someone needs to do something like this, I'd rather like the toString conversion to be performed by the user before creating the Field instance.

[Legacy Jira: Adrien Grand (@jpountz) on Dec 28 2017]