tylerjensen / FuzzyStrings

Fuzzy String Algorithms for .NET
http://www.duovia.net
138 stars 39 forks source link

FuzzyEquals not working for hebrew charachters #5

Open yosimaurer opened 6 years ago

yosimaurer commented 6 years ago

FuzzyEquals and FuzzyMatches does not work with hebrew charachters - and seem to ignore them.

However other fuzzy methods like DiceCoefficient work well.

sample code:

string str1 = "אבג"; string str2 = str1;

        Console.WriteLine(str1.FuzzyEquals(str2));
        Console.WriteLine(str1.FuzzyMatch(str2));

        str1 = "abc";
        str2 = str1;
        Console.WriteLine(str1.FuzzyEquals(str2));
        Console.WriteLine(str1.FuzzyMatch(str2));

Results:

False -0.0625 True 0.999999

tylerje commented 4 years ago

FuzzyMatches is based on Latin character set. See https://github.com/tylerjensen/FuzzyStrings/blob/master/src/DuoVia.FuzzyStrings/DuoVia.FuzzyStrings/StringExtensions.cs#L53

I'm open to a pull request to resolve that for other character sets.

tylerje commented 4 years ago

@yosimaurer the primary problem would be the DoubleMetaphone algorithm that relies on Latin character set and generic English pronunciation. To support another language/character set, you would need to extend that algorithm and modify the FuzzyMatches mashup algo that uses all four of the base algorithms.