nicholasding / solr-lemmatizer

A TokenFilter that applies lemmatization to lemmatize English words.
MIT License
16 stars 4 forks source link

StringIndexOutOfBoundsException #2

Closed shamik closed 7 years ago

shamik commented 7 years ago

I'm seeing a series of exceptions when trying to use the lemmatizer plugin. The error is coming from TernarySearchTree.java. Looks like a plural within a curly brace breaks the code. For example:

Display Force When on, force exerted on the delegate(s) by the Path Follow behavior is drawn in the viewports as a vector during the simulation solution.

delegate(s) here is causing the exception. If I replace delegate(s) with delegate(helpers), everything works.

Below is the stack trace:

Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: 0 at java.lang.String.charAt(String.java:658) at com.nicholasding.search.util.TernarySearchTree.get(TernarySearchTree.java:50) at com.nicholasding.search.util.TernarySearchTree.get(TernarySearchTree.java:42) at com.nicholasding.search.util.TernarySearchTree.contains(TernarySearchTree.java:66) at com.nicholasding.search.lemmatization.impl.WordNetLemmatizer.stem(WordNetLemmatizer.java:51) at com.nicholasding.search.solr.LemmatizerFilter.incrementToken(LemmatizerFilter.java:29) at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:746) at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:447) at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:403) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:232) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:478) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1571) at org.apache.solr.update.DirectUpdateHandler2.updateDocument(DirectUpdateHandler2.java:924) at org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:913) at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:302) at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:239) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:194)

Couple of other patterns:

"If the method succeeds, the return value is S" --- Here the letter "S" is causing the issue.

"For example, if the object’s option is set to Active, only those constraints applied on the active objects will be changed." --- Here the word "object’s" is causing the problem.

I've tried this on Solr 6.5.1 and 6.6, both are consistently throwing this exception. Does it have to do anything with the version of Solr?

nicholasding commented 7 years ago

Hi @shamik, thanks for the detailed information. I will test it on Solr 6 as soon as possible.

nicholasding commented 7 years ago

Hi @shamik, I've found the cause and fixed the issue. You can now checkout the code from master branch and build a working version.

bandops commented 7 years ago

Thanks a lot nicholas, works great. Appreciate your quick help.