onetrueawk / awk

One true awk
Other
2k stars 159 forks source link

FS="\0" differences #165

Closed sdaoden closed 1 year ago

sdaoden commented 1 year ago

Hello. I do not know how portable / desired, but stumbled upon

?0|kent:src$ printf 'a\0b\0c\0' | gawk 'BEGIN{FS="\0"} {for(i=0; i < NF; ++i) print i, $i}'

0 abc 1 a 2 b 3 c

?0|kent:src$ printf 'a\0b\0c\0' | mawk 'BEGIN{FS="\0"} {for(i=0; i < NF; ++i) print i, $i}'

0 abc 1 a 2 b 3 c

?0|kent:src$ printf 'a\0b\0c\0' | nawk 'BEGIN{FS="\0"} {for(i=0; i < NF; ++i) print i, $i}'

0 a

?0|kent:src$ printf 'a\0b\0c\0' | busybox.static awk 'BEGIN{FS="\0"} {for(i=0; i < NF; ++i) print i, $i}'

0 a 0 b 0 c

plan9 commented 1 year ago

as per opengroup base specification (2018):

If FS is a null string, the behavior is unspecified.

i'll see if we can make this compatible with other awks in an upcoming release.

arnoldrobbins commented 1 year ago

The One True Awk uses C strings, which are zero terminated, for just about everything. Thus a record of "a\0\b\0c\0" looks the same as if all it had was "a". Gawk uses pointer + length for all strings, so it can handle something like FS = "\0". In any case, putting NUL bytes into data isn't portable and is also outside the scope of POSIX, which expects data to be text, and not binary.