Closed mikemccand closed 5 years ago
I've searched a bit of details of the parameter naming.
The factories' "name" parameter should come from ICU4J Normalizer2 factory class method parameter.
data - the binary, big-endian normalization (.nrm file) data, or null for ICU data name - "nfc" or "nfkc" or "nfkc_cf" or name of custom data file
Strictly speaking, the ICU4J normalizer's "name" seems not to be equal to the "Unicode normalization form" (it has wider meaning than "normalization form"). Nonetheless "data" is always null when Lucene ICU factories instantiate it so it looks okay to me to change the parameter to "form" from the standpoint of understandability.
Just in case, @thetaphi: does that make sense to you?
[Legacy Jira: Tomoko Uchida (@mocobeta) on Aug 11 2019]
OK, in the ICU factory documentation, it's explicitly documented as follows:
name: A Unicode Normalization Form, one of 'nfc','nfkc', 'nfkc_cf'. Default is nfkc_cf.
So seems there is no need to worry about changing the parameter to "form" :)
Here is the patch that also includes tests and Javadoc changes: LUCENE-8948.patch
[Legacy Jira: Tomoko Uchida (@mocobeta) on Aug 11 2019]
Commit 407ba89aad028a37cf5ec0f131b8773d394177c2 in lucene-solr's branch refs/heads/master from Tomoko Uchida https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=407ba89
LUCENE-8948: Change 'name' argument in ICU factories to 'form'.
[Legacy Jira: ASF subversion and git services on Aug 11 2019]
Thanks for adding the nfkc test :-) Nice!
[Legacy Jira: Uwe Schindler (@uschindler) on Aug 11 2019]
Closing after the 9.0.0 release
[Legacy Jira: Adrien Grand (@jpountz) on Dec 08 2021]
o.a.l.a.icu.ICUNormalizer2CharFilterFactory
ando.a.l.a.icu.ICUNormalizer2FilterFactory
have "name" arguments to specify Unicode Normalization Form. The "name" is vague and it causes problem with SOLR-13593."form" would be suitable here instead of "name".
Legacy Jira details
LUCENE-8948 by Tomoko Uchida (@mocobeta) on Aug 10 2019, resolved Aug 11 2019 Attachments: LUCENE-8948.patch Linked issues: