joshaven / string_score

JavaScript string ranking 0 for no match upto 1 for perfect... "String".score("str"); //=> 0.825
MIT License
840 stars 62 forks source link

Hello World and jello #5

Closed pschwede closed 13 years ago

pschwede commented 13 years ago

"Hello World" and "jello" should score higher than 0 with a fuzziness of 0.5, says your test.

frewsxcv commented 13 years ago

To add onto that:

I guess the way the algorithm works it just stops caring after a missed character

leeoniya commented 13 years ago

other than speed, what is the benefit of using string_score over other proven existing similarity algos like the Jaro-Winkler distance that i've been using for a long time with great success:

http://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance

joshaven commented 13 years ago

I am not familiar with Jaro-Winkler so I cannot answer.

Thanks for the link I'll study up and try to improve my string_score.

The fuzzyness is something I just added in so the 'stop caring' is not something I've really worked much on. I hope to get some time soon to go back over the project and fix up some minor issues like this.

The wikipidea says that Jaro-Winkler is for short strings like names... my string score will work fine with longer strings (500 chars and more)... this may be one benefit. The issue with string length is actually why I wrote the string score.

joshaven commented 13 years ago

issue resolved

joshaven commented 13 years ago

Regarding: Jaro Winkler distance I added a Jaro Winkler comparison. I think looking at this method will help me improve my method a bit. However, the Jaro Winkler does less and is slower (in JavaScript) - which may be only due to the way I have implemented it. I may be able to squeeze a few more milliseconds out... The speed difference is very minor compared to the other options I have looked at. The Jaro (dj) method is great but the Winkler (dw) only is a beginning of string bonus which is not really enough in my estimation. I give bonuses for beginning of string, beginning of word, consecutive characters, and proper case.