Closed Randgalt closed 1 month ago
As an update of what we've discussed over Slack: we need to make sure we always read in a way that's not aligned to the chunk boundaries.
We need to "complete" each chunk and inspect the next one so that we never send a payload of the correct size unless we have verified all signatures:
Content-Length
is correct (i.e., we have sent the same number of bytes we said we would)We therefore need to make sure that:
I do not think we can compute the number of bytes to "preload" in our input stream, as even if we were to make it a prime, the offset by which we would be misaligned with the underlying chunk size would move with each read, eventually aligning at some point.
https://github.com/trinodb/aws-proxy/pull/103/commits/d94bc37ee8c1f94db27575335e34b2ca14d09dac
This just reads the header of the next chunk after a read if we reached its boundary.
I think it may be enough:
read
callnextChunk
(at least as of that PR, prior to the nice cleanup you added here) will also immediately validate a chunk of size 0I uncommented the tests that originally failed and they pass now. Wdyt?
This just reads the header of the next chunk after a read if we reached its boundary.
I think it may be enough:
This is good and solves the problem. I've updated this PR and removed the Read head shim and altered the re-written AwsChunkedInputStream
to read the next chunk header. I tested against testAwsChunkedUpload()
and it fixes it.
We can either go with the previous, Apache Commons based implementation or use this one. wdyt?
We can either go with the previous, Apache Commons based implementation or use this one. wdyt?
I think this current implementation is cleaner and more readable, happy to go with it
Rewrite of
AwsChunkedInputStream
for simplicity/clarity~~Added
ReadAheadInputStream
which buffers a portion of a delegate input stream which is assumed to be anAwsChunkedInputStream
. The purpose of this is to catch an edge case with the final chunk. By the time the final chunk signature is verified, the bytes of the chunk will already be sent to the remote server and, thus, it's too late to prevent it. By reading ahead, the final chunk will get validated before the bytes are sent to the remote.~~When the end of a chunk is reached it reads the next chunk's header. This ensures we don't send the final bytes of a request before they've been validated against the signature. i.e. the final chunk's signature cannot be validated until the final chunk has been read. But, without this change, the final chunk's bytes will already have been sent.