trinodb / aws-proxy

Proxy for S3
Apache License 2.0
7 stars 3 forks source link

Rewrite of AWS Chunked support #105

Closed Randgalt closed 1 month ago

Randgalt commented 2 months ago

Rewrite of AwsChunkedInputStream for simplicity/clarity

~~Added ReadAheadInputStream which buffers a portion of a delegate input stream which is assumed to be an AwsChunkedInputStream. The purpose of this is to catch an edge case with the final chunk. By the time the final chunk signature is verified, the bytes of the chunk will already be sent to the remote server and, thus, it's too late to prevent it. By reading ahead, the final chunk will get validated before the bytes are sent to the remote.~~

When the end of a chunk is reached it reads the next chunk's header. This ensures we don't send the final bytes of a request before they've been validated against the signature. i.e. the final chunk's signature cannot be validated until the final chunk has been read. But, without this change, the final chunk's bytes will already have been sent.

vagaerg commented 2 months ago

As an update of what we've discussed over Slack: we need to make sure we always read in a way that's not aligned to the chunk boundaries.

We need to "complete" each chunk and inspect the next one so that we never send a payload of the correct size unless we have verified all signatures:

We therefore need to make sure that:

I do not think we can compute the number of bytes to "preload" in our input stream, as even if we were to make it a prime, the offset by which we would be misaligned with the underlying chunk size would move with each read, eventually aligning at some point.

vagaerg commented 2 months ago

https://github.com/trinodb/aws-proxy/pull/103/commits/d94bc37ee8c1f94db27575335e34b2ca14d09dac

This just reads the header of the next chunk after a read if we reached its boundary.

I think it may be enough:

I uncommented the tests that originally failed and they pass now. Wdyt?

Randgalt commented 2 months ago

d94bc37

This just reads the header of the next chunk after a read if we reached its boundary.

I think it may be enough:

This is good and solves the problem. I've updated this PR and removed the Read head shim and altered the re-written AwsChunkedInputStream to read the next chunk header. I tested against testAwsChunkedUpload() and it fixes it.

We can either go with the previous, Apache Commons based implementation or use this one. wdyt?

vagaerg commented 2 months ago

We can either go with the previous, Apache Commons based implementation or use this one. wdyt?

I think this current implementation is cleaner and more readable, happy to go with it