mysqludf / lib_mysqludf_stem

MySQL UDF library providing stemming capability for a variety of languages
http://www.mysqludf.org/lib_mysqludf_stem/
GNU Lesser General Public License v2.1
6 stars 1 forks source link

twice single quotes considered as single quotes #1

Closed niravshah2705 closed 10 years ago

niravshah2705 commented 10 years ago

stem_word function should have removed all the single quotes as it does not make sense to keep it in stem word.

here is the example where it keep 1 single quote from 2 single quotes string input.

select stem_word('en','hello'''''),'hello'''''; +-----------------------------+---------+ | stem_word('en','hello''''') | hello'' | +-----------------------------+---------+ | hello' | hello'' | +-----------------------------+---------+ 1 row in set (0.02 sec)

jasny commented 10 years ago

The stemmers are not defined by the udf. Instead it uses the snowball library. For english the porter stemmer is used. Have a look at the instructions on how to modify and rebuild the lib_snowball.