onetrueawk / awk

One true awk
Other
1.98k stars 159 forks source link

fix edge case where FS is changed on commandline #163

Closed ghshephard closed 12 months ago

ghshephard commented 1 year ago

FS is not set when it's a value assigned on the command line:

$ cat > t1
1,2 3
4,5 6

shephard@Gordons-Air:~
$ awk '{printf "x"FS"x  "; print $1":"$2}' t1 FS="[ ,]" t1 t1
x x  1,2:3
x x  4,5:6
x[ ,]x  1,2:3
x[ ,]x  4:5
x[ ,]x  1:2
x[ ,]x  4:5
$ awk --version
awk version 20200816`
ghshephard commented 1 year ago

Fix for issue https://github.com/onetrueawk/awk/issues/162

mpinjr commented 1 year ago

Hi, ghshephard:

Since savefs() exists only to preserve the value of FS at the time a record is set, it should only be called when a value is assigned to $0. The logic is there in setsval (and setfval) to handle the side-effects, but getrec bypasses it to avoid copying (sub, gsub, and perhaps others would also benefit from a nocopy-capable setsval).

Calling savefs at other times can clobber inputFS while its record is valid and unsplit. For example, getline var will trigger the FS command line assignment if it needs to go to the next file for the next record, but it shouldn't affect an unsplit $0. Similarly, an FS command-line assignment before entering an END action should not affect an unsplit $0.

In the following examples, your modification discards inputFS before the record has been split, so that when the field splitting eventually happens it's done with the updated and incorrect FS.

$ # Correct output for both examples is a:b
$ echo a:b > f1
$ echo c:d > f2
$ ./shephard163.out '{getline x; print $1} ' f1 FS=: f2 
a
$ ./shephard163.out 'END {print $1} ' f1 FS=:
a

The simplest fix, unless I've missed something, is to postpone the side-effect handling in getrec until the record's been read.

diff --git a/lib.c b/lib.c
index ebe296f..2da6dfe 100644
--- a/lib.c
+++ b/lib.c
@@ -150,11 +150,6 @@ int getrec(char **pbuf, int *pbufsize, bool isrecord)  /* get next input record *
    }
    DPRINTF("RS=<%s>, FS=<%s>, ARGC=%g, FILENAME=%s\n",
        *RS, *FS, *ARGC, *FILENAME);
-   if (isrecord) {
-       donefld = false;
-       donerec = true;
-       savefs();
-   }
    saveb0 = buf[0];
    buf[0] = 0;
    while (argno < *ARGC || infile == stdin) {
@@ -194,6 +189,9 @@ int getrec(char **pbuf, int *pbufsize, bool isrecord)   /* get next input record *
                    fldtab[0]->fval = result;
                    fldtab[0]->tval |= NUM;
                }
+               donefld = false;
+               donerec = true;
+               savefs();
            }
            setfval(nrloc, nrloc->fval+1);
            setfval(fnrloc, fnrloc->fval+1);

That passes the test suite, fixes the issue you discovered, and shouldn't call savefs unnecessarily.

Regards, Miguel

plan9 commented 1 year ago

hi miguel, thanks for the discussion of the issue and the fix.

plan9 commented 12 months ago

latest release includes @mpinjr fix.