snowballstem / snowball

Snowball compiler and stemming algorithms
https://snowballstem.org/
BSD 3-Clause "New" or "Revised" License
757 stars 173 forks source link

$ on string doesn't work properly with most backends #62

Closed ojwb closed 6 years ago

ojwb commented 7 years ago

This came to light while working on getting the Go backend merged in #57, but also breaks the Latin stemmer (#58).

I believe it affects all languages except C.

Creating a ticket for this so it doesn't get forgotten.

ojwb commented 7 years ago

It was actually the Rust backend in #51:

https://github.com/snowballstem/snowball/pull/51#issuecomment-281535497

ojwb commented 7 years ago

Some work on fixing this here:

https://github.com/snowballstem/snowball/tree/fix-generate-dollar

So far there are potential fixes for Java and Python, though they've not seen much testing.

ojwb commented 6 years ago

The (newly merged) csharp backend lacks the machinery to support this $ on a string entirely. I merged it despite this since it doesn't actually work for most existing backends either, and for csharp you'll currently get a compilation failure rather than incorrect behaviour.

ojwb commented 6 years ago

C# sorted, plus issues with other backends. Will clean up and merge snortly.

ojwb commented 6 years ago

Now fixed in git master by commits leading up to 7291da8f69304e3dbd546db01a6006b833a9701b.

I tested by adding the latin stemming algorithm (which uses $ on string several times) and fixed multiple issues with various language backends which this uncovered.