snowballstem / snowball

Snowball compiler and stemming algorithms
https://snowballstem.org/
BSD 3-Clause "New" or "Revised" License
757 stars 173 forks source link

Java : generated stemmer code variable cannot instantiate abstract grandparent class `SnowballProgram`. #72

Closed GerritDeMeulder closed 6 years ago

GerritDeMeulder commented 6 years ago

Java : there is an issue with the generated stemmer code using a stemmer variable referencing the abstract grandparent SnowballProgram class.

I think I have the solution (see below), but would like to have some feedback.

For example when compiling the Schinke Latin latin.sbl.txt stemmer : see http://snowball.tartarus.org/otherapps/schinke/intro.html

In the generated concrete Java class LatinStemmer, which extends SnowballStemmerextends abstract SnowballProgram. In this class $noun_form and $verb_form variables are copies of the LatinStemmerclass.

Generated code example for LatinStemmer from line 48 resp 57 :

 48: SnowballProgram v_2 = new SnowballProgram(this);
      ... 
 57: SnowballProgram v_4 = new SnowballProgram(this);     

The issue is that SnowballProgram is an abstract class, and thus can not be instantiated.

My possible solution : change the SnowballStemmer class copy constructor to call the super class SnowballProgram

package org.tartarus.snowball;
import java.lang.reflect.InvocationTargetException;

public abstract class SnowballStemmer extends SnowballProgram {

  public abstract boolean stem();

  public SnowballStemmer(SnowballStemmer other) {
    super(other);
  }

 static final long serialVersionUID = 2016072500L;
}

Then also generate the copy constructor in the generated class:

public LatinStemmer(SnowballStemmer other) {
    super(other);
 }

Then the generated code can be changed as follows for "LatinStemmer":

   LatinStemmer v_2 = new LatinStemmer(this);

I can't find a better solution on Stack Overflow, but maybe I'm missing something here? Note that LatinStemmer is one of the few snowball algorithms, that uses a stemmer variable. Is there a way to eliminate such variables?

NB: import java.lang.reflect.InvocationTargetException; this import seems not (longer) used, since there is no try/catch method for InvocationTargetException in SnowballStemmer ?

GerritDeMeulder commented 6 years ago

My Mistake : While optimizing the classes for Android, I somehow had changed SnowballProgram to an abstract class as well, correcting this solves this problem.

ojwb commented 6 years ago

Note that LatinStemmer is one of the few snowball algorithms, that uses a stemmer variable. Is there a way to eliminate such variables?

It's actually $ on a string which is the notable thing there I think.

NB: import java.lang.reflect.InvocationTargetException; this import seems not (longer) used, since there is no try/catch method for InvocationTargetException in SnowballStemmer ?

It looks like it's never been used. Thanks for noticing - I'll remove it.

While optimizing the classes for Android [...]

If you have improvements, please contribute them.

GerritDeMeulder commented 6 years ago

It looks like it's never been used.

I checked again and saw InvocationTargetException is in the parent class.

While optimizing the classes for Android [...] If you have improvements, please contribute them.

Maybe it's "lost in translation" again , what I meant probably sounds more than what it really is: it's mostly (auto)correcting the indentation of generated java code to the google style and adding annotations for Lint checks. Example for these Lint check annotations i.e. for the generated dutch stemmer class, but valid for all:

@SuppressWarnings("unused") public class DutchStemmer extends org.tartarus.snowball.SnowballStemmer ... So : added @SuppressWarnings("unused"), since the class is only referenced in the example app via reflection.

Last two generated methods in the class :

   ...
  @Override
  public boolean equals(Object o) {
    return o instanceof DutchStemmer;
  }

  @Override
  public int hashCode() {
    return DutchStemmer.class.getName().hashCode();
  }

So : added @Override, since the class method Overrides the standard java Object's methods.

ojwb commented 6 years ago

I've pushed both your proposed changes in 2df7a37763d87846425f80ca750b87e149a7a67e.

[...] (auto)correcting the indentation of generated java code to the google style [...]

I'm not sure I want to get involved in Java code-style wars, but if there are commonly accepted indentation conventions for Java code I'm happy to take a patch to make the generated code follow them.