kpdecker / jsdiff

A javascript text differencing implementation.
BSD 3-Clause "New" or "Revised" License
8.12k stars 500 forks source link

% Match between 2 strings #26

Closed jlc467 closed 10 years ago

jlc467 commented 10 years ago

would be really useful to have a function which calculates the % match between the two given strings. It would tell user how similar the strings are in the form of a %.

matanox commented 10 years ago

There is no single measure of "string similarity" that is intuitively nice for all cases and all minds. This is actually a nice and boring topic in computer science and genomics. You could implement that on your own with the measure of your choice, in your own code. E.g. using levenshtein distance.

leeoniya commented 10 years ago

or Damerau-Levenshtein Distance that treats transposition as 1: http://jsperf.com/damerau-levenshtein-distance. Or http://www.joyofdata.de/blog/comparison-of-string-distance-algorithms/ from the article:

So from a top down perspective a good string metric would consider two strings very close if the first and last letter are matching and the letters in between are just permuted. You don’t have to be a genius to tell from the above given descriptions of the algos that none will perform exceptionally well and the one’s that do are probably just immune to perumtations on a whole – but what the heck – I got curious how the metrics respond to permutations. Okay one further aspect – given that even though human reading seems to be unimpressed by framed permutations ambiguous cases might arise – “ecxept”/”except” and “expcet”/”expect” – then the hamming distance would (maybe) determine the interpretation – which is why I chose it for the coloring in the following plot:

have fun!

jlc467 commented 10 years ago

thanks for the response -- have successfully implemented a js lev distance function that suits my needs!