jimregan / foma

Automatically exported from code.google.com/p/foma
0 stars 0 forks source link

med NON-ASCII does not recognize correctly #2

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
See the following foma script:

echo BUGCOMMENT: Everything works fine for ASCII networks and ASCII input
echo
echo read regex n ;
read regex n ;

echo
echo BUGCOMMENT: Computing med n
med n

echo BUGCOMMENT: BUG with non-ASCII input and non-ASCII network
echo BUGCOMMENT: ñ is not recognized by med ñ
echo
echo read regex ñ ;
read regex ñ ;

echo
echo BUGCOMMENT: Computing med ñ
med ñ

What is the expected output? What do you see instead?
I would expect to see the same costs for n and ñ.

BUGCOMMENT: Everything works fine for ASCII networks and ASCII input

read regex n ;
194 bytes. 2 states, 1 arcs, 1 path.

BUGCOMMENT: Computing med n
Calculating heuristic [h]
Using Levenshtein distance.

n
n
Cost[f]: 0

n*
*n
Cost[f]: 2

*n
n*
Cost[f]: 2

BUGCOMMENT: BUG with non-ASCII input and non-ASCII network
BUGCOMMENT: Cost[f]: 0 is missing

read regex ñ ;
195 bytes. 2 states, 1 arcs, 1 path.

BUGCOMMENT: Computing med ñ
Calculating heuristic [h]
Using Levenshtein distance.

*ñ
ñ
Cost[f]: 2

ñ*
ñ
Cost[f]: 2

*ñ*
###
Cost[f]: 3

What version of the product are you using? On what operating system?
0.9.15 on linux and macos

Please provide any additional information below.

Original issue reported on code.google.com by simon.cl...@gmail.com on 23 Nov 2011 at 12:22

Attachments:

GoogleCodeExporter commented 8 years ago
Indeed non-ASCII wasn't calculated correctly with the edit distance functions. 
This should now be fixed in the svn along with other changes to the MED 
functionality. Thanks.

Original comment by mans.hul...@gmail.com on 23 Nov 2011 at 9:19

GoogleCodeExporter commented 8 years ago
Yes, works as expected now, great. Thanks for the quick fix.

Original comment by simon.cl...@gmail.com on 23 Nov 2011 at 9:28