raygard / wak

wak -- an awk implementation for toybox and standalone
BSD Zero Clause License
52 stars 2 forks source link

UTF-8 Test Cases for length(), substr(), index(), and match() #6

Closed oliverkwebb closed 2 months ago

oliverkwebb commented 2 months ago

Sorry, this apparently got auto-closed when I closed another one for the gawk bitwise operations.

A note on combining characters: We don't have to worry about them, at all, UTF-8 safe awks count codepoints, not columns, so there are no fontmetrics nightmares (goawk not being UTF8 safe was a bit of a shock to me):

$ for awk in gawk nawk mawk goawk tbawk muwak; do echo -n $awk " "; $awk '{print length()}' tests/files/utf8/test1.txt; done
gawk  115
nawk  115
mawk  207
goawk  207
tbawk  115
muwak  115