itchyny / gojq

Pure Go implementation of jq
MIT License
3.3k stars 119 forks source link

Curious bug when reading from /dev/urandom or /dev/random #240

Closed pkoppstein closed 9 months ago

pkoppstein commented 9 months ago

I have a couple of jq programs that read from /dev/urandom or /dev/random without problems, but something weird happens when running the same programs with gojq.

I believe the underlying problem can be demonstrated with the following simple program, which issues a line when a number in a stream of numbers appears more than 4 times in a row:

foreach inputs as $n ({};
  .i += 1
  | if .last == $n then .count += 1
    else {i, last: $n}
    end)
| select(.count > 4)

This command illustrates the problem:

< /dev/urandom tr -cd '0-9' | fold -w 3 | $JQ -cnr -f nonrandom.jq

When run with JQ=jq for a while, the program emits nothing, as expected, but when run with JQ=gojq, it shows that gojq sometimes encounters improbable runs of consecutive 0s:

{"count":5,"i":550667,"last":"0"}
{"count":5,"i":930865,"last":"0"}
{"count":5,"i":2047031,"last":"0"}
{"count":5,"i":3014836,"last":"0"}
{"count":5,"i":4259184,"last":"0"}
{"count":5,"i":4428268,"last":"0"}
{"count":5,"i":5294159,"last":"0"}
{"count":5,"i":6927194,"last":"0"}
{"count":5,"i":7147711,"last":"0"}
{"count":5,"i":7816024,"last":"0"}
{"count":5,"i":7820993,"last":"0"}
{"count":5,"i":11333096,"last":"0"}
{"count":5,"i":11533778,"last":"0"}
{"count":5,"i":11878371,"last":"0"}
{"count":5,"i":13456829,"last":"0"}
{"count":5,"i":14668008,"last":"0"}
{"count":5,"i":19402541,"last":"0"}
...

One point to note is that the run length seems always to be 5. Another is that the problem only arises with 0.

One clue is that the problem goes away if a call to 'jq .' is inserted immediately before the call to $JQ. This suggests to me that there is some kind of race condition involved.

itchyny commented 9 months ago

The difference is that jq reads 0 from 000 but gojq reads three 0s (since 000 is not a valid JSON). So if you run the program with jq, there should be 18 zeros consecutively, which is very very rare.

 $ printf '111\n000\n000\n000\n000\n000\n000\n111\n' | jq -cnr -f nonrandom.jq
{"i":7,"last":0,"count":5}

On the other hand, if there are six zeros, gojq emits the output. Which happens sometimes reading from random stream.

 $ printf '111\n000\n000\n111' | gojq -cnr -f nonrandom.jq
{"count":5,"i":7,"last":0}

The count might look always 5 but it is an issue of probability. If you run for a long time or limit the digits to 0-5 for example, you'll also see 6 or 7.

pkoppstein commented 9 months ago

@itchyny - Of course! Thanks.