onetrueawk / awk

One true awk
Other
2.01k stars 160 forks source link

awk -F '' and awk -v FS='' treated differently #127

Closed bsdimp closed 3 years ago

bsdimp commented 3 years ago

While the 1003-2008 standard says that FS='' is undefined, one true awk treats this like every character is its own $n variable. This is documented in awk(1):

If FS is null, the input line is split into one field per character.

The man page does not mention that the separator string must be non-null in its description of -F.

This means that echo foo | awk -v FS='' '{ print $1; }' will print 'f', however echo foo | awk -F '' '{ print $1;}' will print a warning: awk: field separator FS is empty.

1003-2008 also states that -F sepstring and -v FS=sepstring are identical:

-F sepstring Define the input field separator. This option shall be equivalent to: -v FS=sepstring

This suggests to me that the warning is invalid and it should be set to the empty string, given the behavior documented in the manual. This would be the most conservative interpretation of the standard, and would also give useful behavior. A patch looks to be relatively straight forward, but I thought I'd file an issue first to make sure there's consensus around this detail before producing it.

bsdimp commented 3 years ago

I'll note both gawk and mawk also honor the FS='' convention of BWK. Both treat -F '' and -v FS="" consistently.

plan9 commented 3 years ago

you're correct, this inconsistency will need to be fixed.

bsdimp commented 3 years ago

https://github.com/onetrueawk/awk/pull/128 I believe addresses this. Though I'm unsure how to write a regression test for this in the fixed-bugs directory

plan9 commented 3 years ago

thanks for the fix. both regression tests and the general tests are going to be overhauled, i made a note to add a test for the -F / -v FS consistency.

bsdimp commented 3 years ago

Thanks!