savrus / uguu

Automatically exported from code.google.com/p/uguu
Other
3 stars 1 forks source link

Do not break short hwords #48

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Right now we leave only [^a-zA-Z0-9_] characters in a string passed to
tsvector (and _ is further removed by to_tsvector). Maybe we should leave
hyphen in short sequences so some names like 'k-on' would not be broken.

Original issue reported on code.google.com by ruslan.savchenko on 22 Apr 2010 at 7:44

GoogleCodeExporter commented 9 years ago

Original comment by ruslan.savchenko on 22 Apr 2010 at 7:44

GoogleCodeExporter commented 9 years ago
It depends on how tsvector will divide string into words.
If hyphens are ommited then we could do nothing.

Original comment by radist...@gmail.com on 22 Apr 2010 at 7:47

GoogleCodeExporter commented 9 years ago
http://www.postgresql.org/docs/8.4/static/textsearch-parsers.html

Original comment by ruslan.savchenko on 22 Apr 2010 at 7:55