onetrueawk / awk

One true awk
Other
1.99k stars 160 forks source link

Incorrect value for RSTART on UTF-8 string #188

Closed arnoldrobbins closed 1 year ago

arnoldrobbins commented 1 year ago

Hi. The program below incorrectly sets RSTART in the second call to match(). This is a slightly modified version of a program submitted for a similar bug in gawk.

BEGIN {
    str="\342\200\257"
    print length(str)
    match(str,/.+/)
    print RSTART, RLENGTH
    match(str,/$/)
    print RSTART, RLENGTH
}

When I run it, I get this:

$ ./a.out -f /tmp/morton.awk 
1
1 1
-1 0

In the last line, the start value should be 2. FWIW, the gawk bug was that RSTART was 4; it was using byte counts instead of character counts.

plan9 commented 1 year ago

[oof] thanks arnold!

plan9 commented 1 year ago

found the bug, it came with the unicode changes.