onetrueawk / awk

One true awk
Other
1.99k stars 160 forks source link

Disable utf-8 for non-multibyte locales, such as C or POSIX. #195

Closed millert closed 11 months ago

millert commented 1 year ago

This makes it possible to get the old awk behavior (where chars are bytes) by setting LC_CTYPE to C or POSIX. The value of MB_CUR_MAX is cached since in many cases is actually a function.

No attempt is made to support arbitrary locales. If the locale can support multi-byte characters, utf-8 will be used. If not, 8-bit characters are used instead.

This also addresses #190

plan9 commented 1 year ago

thanks Todd.

plan9 commented 1 year ago

sorry todd, I knew there were conflicts but wanted to push the important fnematch fixes first.