joshaven / string_score

JavaScript string ranking 0 for no match upto 1 for perfect... "String".score("str"); //=> 0.825
MIT License
840 stars 62 forks source link

"you;S".score("you10") => 0 #4

Closed gangster closed 13 years ago

gangster commented 13 years ago

"you;S".score("you10") yields a score of 0. Seems like it should produce a score higher than that, no?

joshaven commented 13 years ago

I have considered having this yield a score but it does not according to the way things work because as soon as it finds a non matching character it returns 0. This is more of a feature request then an issue. I am thinking about deviding the match in half for every unmatched charactor so if the match would have been 0.8 then with the two mis-matches it would end up as 0.2 I will probably implement this as an optional setting. Something like {fuzzy: true}

joshaven commented 13 years ago

One of the reasons I don't support this type of mismatch by default is that two things that you don't expect to match would... like: "Barack Obama" would match to some degree "George Bush" because they share some of the same letters even though they are very different.

I think it has merit as an option though.

joshaven commented 13 years ago

The original purpose was to be able to abbreviate strings...

typing: "YMCA" should match: "Young Men's Christian Association" better then: "Young Women's Christian Association", which would be matched better with "YWCA"

gangster commented 13 years ago

Ah, I see. An option would be great. I'm trying to sift through Twitter spam in a node project I am working on. Often times, you'll see shit like this:

"ZOMFG I LOVE YOU @JUSTINBIEBER;s"
"ZOMFG I LOVE YOU @JUSTINBIEBER10"
"ZOMFG I LOVE YOU @JUSTINBIEBER34,4"

And so on. So being able to score strings like that would be extremely useful in my case. Can you perhaps recommend another library or method for comparing these types of strings?

Cheers

joshaven commented 13 years ago

I don't have time to make the change and test things right now but I think you could just modify lines 55-58 of string_score.js to not bail when a mismatch is found

    if (index_in_string === -1) {
        // Bail out if no abbr[i] is not found in string
        return 0;
    }

Could be something like: if (index_in_string === -1) { character_score = character_score/2; }

You would also want to move this if statement down below line 61 to ensure that character_score was not null and was non-zero

If you have the time to write tests and test the project I'd gladly implement it... otherwise it will likely be a while before I get the time to implement this. I am currently working 10 to 16 hour days.

As far as another recommendation, you could see if the other projects that I have included in my tests for comparisons meet your needs: quicksilver.js & liquidmetal.js (fuzzy_string.js does what you want but has major issues with larger string comparisons.)