Open dspinellis opened 4 years ago
From my understanding, some unsafe operations (e.g., assigning to an accumulator) can be detected by examining the syntax of the awk program.
But, I reckon that some checks cannot be performed statically e.g., you need dynamic analysis for the following case, right?
"Reading a variable's value, for an operation other than accumulation, results in a fatal error if the variable's associated file id/offset is different from the current file id/offset".
The problem with dynamic analysis is that depends on the current execution. You cannot guarantee that all executions of the program are error-free. For example, executions with different thread schedules may lead to different results.
I assume that allowing variable reads and writes only within the same block is too restrictive, isn't it?
@theosotr Actually, allowing variables reads and writes only in the same block, although restrictive, is the only way to guarrantee that there are no edge cases that have not been covered through dynamic analysis checking.
@theosotr Good point! If an error did not occur in a particular execution, then the result should be correct. But there's no guarantee that another execution will not fail. Static analysis could offer that guarantee. But the case you mention cannot be statically checked. Consider the following (correct) example, which could be checked with sophisticated static analysis.
NR == 1 { a = 1 }
NR + 100 == 101 { if (a) k++ }
Now , consider a predicate that cannot be statically checked, such as a regular expression.
# Should succeed
print 'hello\nworld' |
awk 'NR == 1 {a++} /hello/ && a {b++}'
# Should fail
print 'hello\nworld\nhello' |
awk 'NR == 1 {a++} /hello/ && a {b++}'
@gthd What edge cases do you have in mind? I'm OK with (more restrictive) static checking, because I think that in practice it won't be too restrictive. But I think it will be more difficult to implement.
I added two more reduction operators, min
and max
, and updated the description accordingly.
@dspinellis Yes, static analysis does not work in this case.
Dynamic analysis can indeed verify whether a violation occurs but has the problem of non-deterministic results, e.g., in your example above the awk program remained the same and what actually changed was the input.
Typically, in many cases such guarantees are given (e.g., all programs are data race free) by design, e.g., by introducing language constructs or a type system.
x++
,++x
,x--
,--x
,x += value
,x -= value
a = min(a, expr)
,a = max(a, expr)
,a = and(a, expr)
,a = or(a, expr)
,a = xor(a, expr)
,a
or an associative array cell, e.g.a["apple"]
END
block the following reduction takes place:min
,max
,and
,or
,xor
has that function applied to the values across threads resulting in a single value with the same nameLine 4, record 54: assignment to an accumulator variable
Error examples and test cases
All examples should also work for an array cell, e.g.
a[4]
.Variable used in different blocks
Variable used for both assignment and accumulation
Reading from a different record
Correct examples and test cases
Variable only used for assignment
Variable only used for accumulation
Variable used only within the same record
Using a variable on the same record
Motivating example: descriptive statistics