vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.73k stars 1.57k forks source link

Allow headers to be set for all HTTP based sinks #1673

Open binarylogic opened 4 years ago

binarylogic commented 4 years ago

For any HTTP based sink, we should allow users to set custom headers. This is particularly important for services that support special headers (like S3). With S3, you can set all kinds of special headers that set the ACL, encryption mechanism, etc. This is already provided in the http sink and I would like the exact same option available to the following sinks:

I don't see any downside to providing this option.


For further context, this issue came out of a meeting with @zcapper. They're using the aws_s3 sink to write objects across accounts. This scenario is perfectly described in this AWS tutorial. To summarize, I'll walk through a simple example.

Given two AWS accounts A and B:

  1. Account A is where Vector is deployed.
  2. Account B owns the S3 bucket.
  3. Account A is granted cross-account access to account B's S3 bucket via S3's cross-account bucket permissions.
  4. When account A writes to the bucket, the S3 object ownership remains under account A.
  5. Users in account B cannot modify the object as a result.

This can be easily solved by supplying the x-amz-grant-full-control header when writing the object.

binarylogic commented 4 years ago

@bruceg could you investigate doing this for the aws_s3 sink first? I assume solving that will also solve the other AWS based sinks?

zcapper commented 4 years ago

@binarylogic thanks for adding this! The custom http header approach makes a lot of sense for the aws_s3 sink. We use at least a couple of them, and it will likely future-proof against any headers AWS adds in the future.

With regards to the other AWS sinks, from what I can see the non-S3 APIs (e.g. Kinesis streams) may not use optional HTTP headers at all. It might be worth doing a survey of the landscape just to see how many of the other sinks could make use of custom headers.

bruceg commented 4 years ago

For the AWS S3 sink, at least, adding arbitrary headers is going to require a lot of duplication of external code. The S3Client::put_object method we use creates and finishes the request object before returning. So to add our own request headers we will need to either duplicate the method or add support to the crate and use our local/custom copy until it gets upstreamed.

The PutObjectRequest does have an option for a canned ACL, as well as for the specific x-amz-grant-full-control header being requested. Can we scope this issue down to just those extra bits (and anything else already part of the PutObjectRequest since it will satisfy the ownership request?

binarylogic commented 4 years ago

:( That's unfortunate. Would you mind opening an issue in Rusoto requesting this? And yes, in the interim let's just map the grant_* options to our own. There are some other good options in here. Do you think we should open a separate issue for them?

bruceg commented 4 years ago

I'm not sure if it will be of much value to the rusoto crate. AFAICT all of the options supported when creating S3 objects (S3 PutObject API) are exposed through the rusoto_s3::PutObjectRequest.