snowballstem / snowball

Snowball compiler and stemming algorithms
https://snowballstem.org/
BSD 3-Clause "New" or "Revised" License
757 stars 173 forks source link

Missing return statement in generated Java and Rust code #82

Closed dscorbett closed 6 years ago

dscorbett commented 6 years ago

The following Snowball code:

externals (stem)
define stem as (false or true)

produces invalid Java and Rust code: the return statements of stem are missing.

The Java error is:

org/tartarus/snowball/ext/testStemmer.java:27: error: missing return statement
}
^
1 error

The Rust error is:

error[E0308]: mismatched types
  --> /Users/dcorbett/snowball/rust/target/debug/build/testapp-ca6d6d95758b7568/out/test_stemmer.rs:25:9
   |
25 |         break 'lab0;
   |         ^^^^^^^^^^^ expected (), found bool
   |
   = note: expected type `()`
              found type `bool`
ojwb commented 6 years ago

It's almost certainly a bug in the reachability tracking. In Java simple cases of unreachable code are a compile-time error, which is a PITA for code generators like Snowball and means we need to track reachability and omit code that can't be reached.

Not sure if the Rust language has the same annoying feature, or if the Rust backend just copied the Java one.

Having a look - it's probably something simple. Thanks for the minimal testcase (was there a more complex one you reduced this from that I should also check gets fixed?)

dscorbett commented 6 years ago

I noticed this while working on the French stemmer but I didn’t save the original full case.

Rust does check for unreachable code. By default, it just prints a warning.

ojwb commented 6 years ago

Rust does check for unreachable code. By default, it just prints a warning.

OK, that's probably true of a lot of languages and so it's more widely useful to omit unreachable code, and for interpreted languages it may speed up runtime too, but Java is unusual in it actually being defined to be a compile time error in the language spec, and so actually a hard requirement.

I have a fix for the missing return (which seems to affect all the languages except C and C#, which are the two which don't try to track reachability), but the same testcase currently causes the Python and Javascript backends to segfault, which I ought to address first so I can test the fix for the missing return for those two languages.

ojwb commented 6 years ago

The segfault was just lack of a default for the parent class name for those two languages, which I've also pushed a fix for.