Closed caffe3 closed 6 months ago
Thanks for reviewing, @millert - let me know if any other changes should be made.
@mpinjr could you have a look too :+1:
I discovered another existing use-after-free in the same function with patbeg
and added another commit to resolve that.
Another option would be to add "int patoff = 0" and just set "patoff = i" instead of "pb = buf+ i". Either way you need to update patbeg at the end. I think using offsets consistently may be easier to follow.
@millert - I've updated as you suggested: and I agree, patoff
is easier to read and follow.
@caffe3 Looks good to me.
@caffe3
The new
fnematch
simplifies the code base by using i, j, k pointers to track the parsing. Unfortunately, whenadjbuf
is called, the pointers are can be invalidated becauseadjbuf
usesrealloc
internally and can move the passedbuf
to a new memory address.
That was certainly a facepalming blunder. Thank you for catching it.
This commit switches i, j, k to be indices relative to
buf
that is updated on the fly.
It's not my call, but I prefer to keep using pointers instead of ints. Eventually, it would be nice to not be limited to INT_MAX records. And, either way, a few pointers will need updating after buffer relocation and/or resizing. Here's a diff that moves all of those updates so that they immediately follow the adjbuf that necessitates them. It compiled cleanly, passed the test suite, and your test case under valgrind.
https://github.com/mpinjr/awk/commit/a2453087b5ae7d0a213f9679e14bf634f79ae3b6
@mpinjr could you have a look too 👍
Sorry it took me a while. Very busy with other work at the moment. Thanks again for catching and reporting this. I appreciate it.
Unrelatedly, we should change MAX_UTF_BYTES to awk_mb_cur_max. For those libc's that can only ungetc one char, when operating under a single-byte character locale, we can avoid reading ahead and unnecessarily triggering multiple ungetcs. This will alleviate at least some of the ungetc issues. A complete fix is more complicated because it requires stream handling changes outside of fnematch (because the next record may not necessarily be read by fnematch if RS changes).
I hope everyone is doing well, Miguel
thanks for the discussion. I assume the changes are settled, with enough many eyes involved. @mpinjr thanks for the suggestion vis a vis MAX_UTF_BYTES. will do. thanks again.
I'm happy with this PR and I don't have a strong opinion for whether pointers (in @mpinjr 's version) or my PR's indices are used.
thank you all, @mpinjr fix has been included. MAX_UTF_BYTES is now awk_mb_cur_max.
The new
fnematch
simplifies the code base by using i, j, k pointers to track the parsing. Unfortunately, whenadjbuf
is called, the pointers are can be invalidated becauseadjbuf
usesrealloc
internally and can move the passedbuf
to a new memory address.This commit switches i, j, k to be indices relative to
buf
that is updated on the fly.Fixes the use-after-free on long input:
printf "%8192s\n" | tr " " "=" | ./a.out 'BEGIN{RS="th[^h]+"}{}'