snowballstem / snowball

Snowball compiler and stemming algorithms
https://snowballstem.org/
BSD 3-Clause "New" or "Revised" License
757 stars 173 forks source link

Esperanto stemmer #185

Open ojwb opened 1 year ago

ojwb commented 1 year ago

I noticed pygment's Snowball support includes a test file which seems to be a Snowball stemmer for Esperanto:

https://github.com/pygments/pygments/blob/87bb672d4788186bc1d52faa8f5a7ccfb21c96ef/tests/examplefiles/snowball/example.sbl

I've not really investigated it but it seems plausible from an initial look.

@dscorbett It looks like you contributed the pygments support for Snowball (thanks for that) and your name is on the commit that added this test file - is this Esperanto stemmer yours?

If so, is it something we should be looking at merging?

One issue is it looks like the code has been deliberately tweaked to better serve as a test of the highlighting, e.g. define short_word as not (loop (maxint * 0 + 4 / 2) gopast vowel) and use of 3 different stringescapes settings. Is there a "clean" version? I didn't seem to turn up anything via searching the web.

dscorbett commented 1 year ago

Yes, I wrote that. I don’t know how much use an Esperanto stemmer would get, but it does work and it handles a lot of non-obvious cases. I designed it from the start to be a Pygments example file so there is no clean version. Should I clean it up and make PRs for here, snowball-data, and snowball-website?