Closed hallenstal closed 1 year ago
Nice catch, can confirm (5e49ea4).
awk
purposely does not define the order in which a for (i in array)
loop goes through the array. You cannot depend on it to be "sequential", and different implementations will go through the loop in different orders. If you require sequential traversal, do it like so:
n = length(array)
for (i = 1; i <= n; i++)
do something with array[i]
This should only be used when you know for sure that the indices are sequential (such as with split()
) since indices can be strings, or even be missing.
Closing this issue.
Well, you could of course have different opinions on this. When an array is indexed by an integer sequence a good design would take them in order. Of course there is always workarounds. BRMagnusSkickat från min iPhone10 feb. 2023 kl. 15:19 skrev Arnold Robbins @.***>: awk purposely does not define the order in which a for (i in array) loop goes through the array. You cannot depend on it to be "sequential", and different implementations will go through the loop in different orders. If you require sequential traversal, do it like so: n = length(array) for (i = 1; i <= n; i++) do something with array[i] This should only be used when you know for sure that the indices are sequential (such as with split()) since indices can be strings, or even be missing. Closing this issue.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>
Possible to revisit the decision here?
I'd argue for several points:
Given that awk arrays are actually associative, like maps, the keys could be either numbers or strings, or even a series of numbers with skipped values (holes), therefore it's more preferable to use for(var in array)
to loop an array
Making things worse, the original awk doesn't even provide a builtin array length
function. To be able to iterate through a properly indexed array incrementally, one has to first loop through the array using for(var in array)
to count the array length, then loop the array again with for(i=0;i<length;i++)
, to get the order right. This also applies to some other awk distributions.
array[pos]
would NOT work because the key at position pos
could be a string instead of the natural number, causing pos
to be an invalid index.$ echo "one;three;54;3;86;seven" | /usr/bin/awk '{split($0,a,";");a["k"]="v";len=length(a); print "length:"len; for(i=1;i<len;i++) {print "a:"i, a[i]} }'
length:7 # due to the extra entry `a["k"]="v"
a:1 one
a:2 three
a:3 54
a:4 3
a:5 86
a:6 seven
# a:k v is missed
With for (var in array)
, the array is iterated almost in sequential order, except the first element is always iterated the last, doesn't it seem like a suspicious off-by-one bug somewhere?
echo "one;three;54;3;86;seven" | awk '{split($0,a,";");for(i in a){print "a[" i "]=" a[i] }}' a[2]=three a[3]=54 a[4]=3 a[5]=86 a[6]=seven a[1]=one
Hello.
Possible to revisit the decision here?
Not really, no. The array management isn't going to change.
I'd argue for several points:
- Given that awk arrays are actually associative, like maps, the keys could be either numbers or strings, or even a series of numbers with skipped values (holes), therefore it's more preferable to use
for(var in array)
to loop an array
So this is arguing against ordered traversal of the array.
- Making things worse, the original awk doesn't even provide a builtin array
length
function.
If by "original" you mean this version, you are incorrect. It has supported length(array)
since January of 2002, over 20 years.
To be able to iterate through a properly indexed array incrementally, one has to first loop through the array using
for(var in array)
to count the array length, then loop the array again withfor(i=0;i<length;i++)
, to get the order right. This also applies to some other awk distributions.
This isn't necessary. If you know that an array is indexed from 1 to N, you can do this:
for (i = 1; i in array; i++) ...
- Even worse, if the array contains string keys, then
array[pos]
would NOT work because the key at positionpos
could be a string instead of the natural number, causingpos
to be an invalid index.
So this also argues against trying to provided ordered traversal of arrays.
- With
for (var in array)
, the array is iterated almost in sequential order, except the first element is always iterated the last, doesn't it seem like a suspicious off-by-one bug somewhere?echo "one;three;54;3;86;seven" | awk '{split($0,a,";");for(i in a){print "a[" i "]=" a[i] }}' a[2]=three a[3]=54 a[4]=3 a[5]=86 a[6]=seven a[1]=one
Arrays are implemented using hash tables. What you're seeing is how things hash. Since the number of items in the array is small, it looks like it's sequential, but if you put in a lot of elements (say 100), you'll see that the order isn't sequential at all. In short, there's no bug here.
As described, ordered traversal isn't so simple. Gawk provides ways to do it. It isn't the default in awk
both because it's difficult to define what the ordering should be when numbers and strings are mixed, and also because it adds an extra expensive step to the process: sorting. The cost for setting up an ordered traversal through a hash table, particularly when there are lots of elements, can be measured and it can be expensive. Making ordered traversal the default means that users are paying for a feature they rarely need, and that's not a nice way to write software.
I hope all this helps. Thanks.
on MacOS,awk version 20200816: echo "one;three;54;3;86;seven" | awk '{split($0,a,";");for(i in a){print "a[" i "]=" a[i] }}' a[2]=three a[3]=54 a[4]=3 a[5]=86 a[6]=seven a[1]=one