mikemccand / stargazers-migration-test

Testing Lucene's Jira -> GitHub issues migration
0 stars 0 forks source link

Change "name" argument in ICU factories to "form" [LUCENE-8948] #945

Closed mikemccand closed 5 years ago

mikemccand commented 5 years ago

o.a.l.a.icu.ICUNormalizer2CharFilterFactory and o.a.l.a.icu.ICUNormalizer2FilterFactory have "name" arguments to specify Unicode Normalization Form. The "name" is vague and it causes problem with SOLR-13593.

"form" would be suitable here instead of "name".


Legacy Jira details

LUCENE-8948 by Tomoko Uchida (@mocobeta) on Aug 10 2019, resolved Aug 11 2019 Attachments: LUCENE-8948.patch Linked issues:

mikemccand commented 5 years ago

I've searched a bit of details of the parameter naming.

The factories' "name" parameter should come from ICU4J Normalizer2 factory class method parameter.

http://www.icu-project.org/apiref/icu4j/com/ibm/icu/text/Normalizer2.html#getInstance-java.io.InputStream-java.lang.String-com.ibm.icu.text.Normalizer2.Mode-

data - the binary, big-endian normalization (.nrm file) data, or null for ICU data name - "nfc" or "nfkc" or "nfkc_cf" or name of custom data file

Strictly speaking, the ICU4J normalizer's "name" seems not to be equal to the "Unicode normalization form" (it has wider meaning than "normalization form"). Nonetheless "data" is always null when Lucene ICU factories instantiate it so it looks okay to me to change the parameter to "form" from the standpoint of understandability.

Just in case, @thetaphi: does that make sense to you?

[Legacy Jira: Tomoko Uchida (@mocobeta) on Aug 11 2019]

mikemccand commented 5 years ago

OK, in the ICU factory documentation, it's explicitly documented as follows:

name: A Unicode Normalization Form, one of 'nfc','nfkc', 'nfkc_cf'. Default is nfkc_cf.

So seems there is no need to worry about changing the parameter to "form" :)

Here is the patch that also includes tests and Javadoc changes: LUCENE-8948.patch

[Legacy Jira: Tomoko Uchida (@mocobeta) on Aug 11 2019]

mikemccand commented 5 years ago

Commit 407ba89aad028a37cf5ec0f131b8773d394177c2 in lucene-solr's branch refs/heads/master from Tomoko Uchida https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=407ba89

LUCENE-8948: Change 'name' argument in ICU factories to 'form'.

[Legacy Jira: ASF subversion and git services on Aug 11 2019]

mikemccand commented 5 years ago

Thanks for adding the nfkc test :-) Nice!

[Legacy Jira: Uwe Schindler (@uschindler) on Aug 11 2019]

mikemccand commented 2 years ago

Closing after the 9.0.0 release

[Legacy Jira: Adrien Grand (@jpountz) on Dec 08 2021]