mikemccand / stargazers-migration-test

Testing Lucene's Jira -> GitHub issues migration
0 stars 0 forks source link

Remove ICU dependency of kuromoji tools/test-tools [LUCENE-8866] #863

Closed mikemccand closed 5 years ago

mikemccand commented 5 years ago

The tooling stuff has an off-by-default option to normalize entries, currently using the ICU api.

But I think since its off-by-default, and just doing NFKC normalization at dictionary-build-time, its a better tradeoff to use the JDK here?

I would rather remove the ICU dependency for the tooling and look at simplifying the build to have less modules (e.g. investigate moving the tooling and tests into src/java and src/tools, so that [~msokolov@gmail.com] new tests in LUCENE-8863 are running by default, dictionary tool is shipped as a commandline tool in the JAR, etc)

"ant regenerate" should be enough to prevent any chicken-and-eggs in the dictionary construction code, so I don't think we need separate modules to enforce it.


Legacy Jira details

LUCENE-8866 by Robert Muir (@rmuir) on Jun 18 2019, resolved Jun 21 2019 Attachments: LUCENE-8866.patch

mikemccand commented 5 years ago

Simple patch, I didn't move any code around, just removed the external dep.

[Legacy Jira: Robert Muir (@rmuir) on Jun 18 2019]

mikemccand commented 5 years ago

+1 if people have more precise normalization requirements, they can encode them in their dictionary – I think we can presume this is not noisy user data, and should already have been cleaned.

[Legacy Jira: Michael Sokolov (@msokolov) on Jun 18 2019]

mikemccand commented 5 years ago

If there are no objections I will wait until LUCENE-8863 is merged. The patch here poached some build changes from Mike S's PR for LUCENE-8863 because I needed to run test-tools.

[Legacy Jira: Robert Muir (@rmuir) on Jun 20 2019]

mikemccand commented 5 years ago

Commit 91331d1a891d76173f6854287f11821e6ab41fae in lucene-solr's branch refs/heads/master from Robert Muir https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=91331d1

LUCENE-8866: remove kuromoji/tools dependency on ICU

[Legacy Jira: ASF subversion and git services on Jun 21 2019]

mikemccand commented 5 years ago

Commit 2adc8c6c13d1a74c3a371c2341a05507e893dabf in lucene-solr's branch refs/heads/branch_8x from Robert Muir https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2adc8c6

LUCENE-8866: remove kuromoji/tools dependency on ICU

[Legacy Jira: ASF subversion and git services on Jun 21 2019]

mikemccand commented 5 years ago

Closing after the 8.2.0 release

[Legacy Jira: Ignacio Vera (@iverase) on Jul 26 2019]