mikemccand / stargazers-migration-test

Testing Lucene's Jira -> GitHub issues migration
0 stars 0 forks source link

Add SpanishMinimalStemFilter [LUCENE-8936] #933

Closed mikemccand closed 5 years ago

mikemccand commented 5 years ago

SpanishMinimalStemmerFilter is less aggressive stemmer than SpanishLightStemmerFilter

Ex:

input tokens -> output tokens

  1. camiseta niños -> camiseta and nino
  2. camisas -> camisa

camisetas and camisas are t-shirts and shirts respectively. Stemming both of the tokens to camis will match both tokens and returns both t-shirts and shirts for query camisas(shirts). SpanishMinimalStemmerFilter will help handling these cases.

And importantly It will preserve gender context with tokens.

Ex:  niños ,niñas chicos and chicas are stemmed to nino, nina, chico and chica


Legacy Jira details

LUCENE-8936 by vinod kumar on Jul 26 2019, resolved Jul 28 2019 Attachments: LUCENE-8936.patch (versions: 2)

mikemccand commented 5 years ago

@atris can you please help me on this. I have done all development. access is denied for me to raise pull request.  

[Legacy Jira: vinod kumar on Jul 27 2019]

mikemccand commented 5 years ago

Hello Vinod!

Welcome to the community. Thank you for your contribution.

I would suggest following either of two approaches : 1) Attach a patch to this JIRA or 2) Open a pull request on the Lucene-Solr Github repository. Somebody will review your contribution soon and provide feedback.

[Legacy Jira: Atri Sharma (@atris) on Jul 27 2019]

mikemccand commented 5 years ago

Thank you @atris. attached patch file LUCENE-8936.patch

[Legacy Jira: vinod kumar on Jul 27 2019]

mikemccand commented 5 years ago

@atris  github Permission to apache/lucene-solr.git denied for me. can you also suggest how do I get permission. Thanks for your time.

[Legacy Jira: vinod kumar on Jul 27 2019]

mikemccand commented 5 years ago

Hi @vinod1812,

the patch looks fine. Actually I cannot review the SpanishMinimalStemmer class (I don't understand Spanish), but other parts looks okay to me. And this passed ant precommit (thanks!).

I will commit it after waiting 24 hours if there are no other comments.

 

About the github PR, only the Lucene/Solr committers have the write permission to the apache/lucene-solr repo. So you have to fork the repo and open a pull request. But this time, a patch has been provided so you do not need to do so.

[Legacy Jira: Tomoko Uchida (@mocobeta) on Jul 27 2019]

mikemccand commented 5 years ago

@vinod1812: I noticed your name is credited in SpanishMinimalStemmer Javadocs. Lucene/Solr source code don't have any @author tag or person's name who donated the code. Credits are appeared only in the commit log and CHANGES. Can you please remove it?

[Legacy Jira: Tomoko Uchida (@mocobeta) on Jul 27 2019]

mikemccand commented 5 years ago

@tomoko Thank you. Have removed it and uploaded latest path.  Thanks for your time.

[Legacy Jira: vinod kumar on Jul 27 2019]

mikemccand commented 5 years ago

+1 to the patch.

Let us wait one day or so, then commit the changes on the master and 8.x branch.

[Legacy Jira: Tomoko Uchida (@mocobeta) on Jul 27 2019]

mikemccand commented 5 years ago

okay.

 

[Legacy Jira: vinod kumar on Jul 27 2019]

mikemccand commented 5 years ago
+1 overall
Vote Subsystem Runtime Comment
Prechecks
+1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
master Compile Tests
+1 compile 1m 25s master passed
Patch Compile Tests
+1 compile 1m 16s the patch passed
+1 javac 1m 16s the patch passed
+1 Release audit (RAT) 1m 16s the patch passed
+1 Check forbidden APIs 1m 16s the patch passed
+1 Validate source patterns 1m 16s the patch passed
Other Tests
+1 unit 7m 48s common in the patch passed.
13m 4s
Subsystem Report/Notes
JIRA Issue LUCENE-8936
JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12976054/LUCENE-8936.patch
Optional Tests compile javac unit ratsources checkforbiddenapis validatesourcepatterns
uname Linux lucene2-us-west.apache.org 4.4.0-112-generic #135-Ubuntu SMP Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool ant
Personality /home/jenkins/jenkins-slave/workspace/PreCommit-LUCENE-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
git revision master / 4050ddc
ant version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018
Default Java LTS
Test Results https://builds.apache.org/job/PreCommit-LUCENE-Build/199/testReport/
modules C: lucene lucene/analysis/common U: lucene
Console output https://builds.apache.org/job/PreCommit-LUCENE-Build/199/console
Powered by Apache Yetus 0.7.0 http://yetus.apache.org

This message was automatically generated.

[Legacy Jira: Lucene/Solr QA on Jul 27 2019]

mikemccand commented 5 years ago

Hi @vinod1812,

would you tell me your e-mail address to credit your name with e-mail as the Author of the commit? (I cannot find it from mail list or jira.)

[Legacy Jira: Tomoko Uchida (@mocobeta) on Jul 28 2019]

mikemccand commented 5 years ago

Hi @mocobeta

vinod.nandikolmath@yahoo.com

[Legacy Jira: vinod kumar on Jul 28 2019]

mikemccand commented 5 years ago

Commit 8c8d8abddc9f5f8c92943e50d6169882e7188c44 in lucene-solr's branch refs/heads/master from vinod kumar https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8c8d8ab

LUCENE-8936: Add SpanishMinimalStemFilter

Signed-off-by: Tomoko Uchida <tomoko@apache.org>

[Legacy Jira: ASF subversion and git services on Jul 28 2019]

mikemccand commented 5 years ago

Commit a229e711cabb6027eecd06e2c9ec92002d2b6949 in lucene-solr's branch refs/heads/branch_8x from vinod kumar https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a229e71

LUCENE-8936: Add SpanishMinimalStemFilter

Signed-off-by: Tomoko Uchida <tomoko@apache.org>

[Legacy Jira: ASF subversion and git services on Jul 28 2019]

mikemccand commented 5 years ago

I moved the change log to 8.3.0 updates section since this will be shipped with the next 8.3.0 release.

Thanks @vinod1812!

[Legacy Jira: Tomoko Uchida (@mocobeta) on Jul 28 2019]

mikemccand commented 5 years ago

Okay, Thank you @tomoko

[Legacy Jira: vinod kumar on Jul 28 2019]

mikemccand commented 2 years ago

Closing after the 9.0.0 release

[Legacy Jira: Adrien Grand (@jpountz) on Dec 08 2021]