wollmers / Text-Levenshtein-Uni

Text-Levenshtein-Uni - calculate Levenshtein distance for Unicode (UTF-8 or U32) strings
Other
0 stars 0 forks source link

Make test - for Hindi #3

Open Shreeshrii opened 2 years ago

Shreeshrii commented 2 years ago

The test that I had added, I expected it to fail since the words are very different. However, it still passed.

[ 'राम', 'विष्णु', ],

Phonetically transliterating, it is [ 'raama', 'viShNu', ],

So, no consonant or vowel mark is same between the two words, yet test PASSes. I assume that Example1 has strings that are expected to be similar.

So, maybe the following could pass:

[ 'राम', 'रम', ],

[ 'raama', 'rama', ],

or

[ 'राम', 'रमा', ],

[ 'raama', 'ramaa', ],

or

[ 'राम', 'रामा', ],

[ 'raama', 'raamaa', ],

Shreeshrii commented 2 years ago

HINDI for testing combining characters if (1) { my $string1 = 'राज्य'; my $string2 = 'उसकी';

Please let me know what is being tested here.

wollmers commented 2 years ago

@Shreeshrii

The tests are on character level. The test compares the edit distance (an integer number) of the implementation against the edit distance of an other implementation. The tests pass, if the edit distance is the same. If not, the test fails, and the implementation is incorrect.