vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
18.25k stars 1.6k forks source link

The `aws_kinesis_firehose` source `store_access_key` option doesn't appear to actually store the access key #18108

Open jszwedko opened 1 year ago

jszwedko commented 1 year ago

A note for the community

Problem

https://github.com/vectordotdev/vector/commit/2b446f7a434a74bd7d65b930797fc2f518540320 added an option to store the access key used for requests to the AWS Kinesis Firehose source, but this doesn't appear to be working correctly.

With the below config and data file, and running:

curl -i -XPOST -H X-Amz-Firehose-Access-Key: access1 -HX-Amz-Firehose-Request-Id: 123 -H X-Amz-Firehose-Source-Arn: hmm -H X-Amz-Firehose-Protocol-Version: 1.0 --data-binary @/tmp/data.json localhost:8080

The access key is reported as null. This appears to be due to the fact that the key isn't actually stored from the header. Instead it seems to be expecting to find it in the request body.

Configuration

sources:
    firehose:
      type: aws_kinesis_firehose
      address: 0.0.0.0:80
      access_keys: ["access1", "access2"]
      store_access_key: true
  transforms:
    parse_firehose:
      type: remap
      drop_on_abort: true
      drop_on_error: true
      inputs: ["firehose"]
      source: |
        .access_key = get_secret("aws_kinesis_firehose_access_key")
  sinks:
    console:
      type: console
      encoding:
        codec: native_json
      inputs:
        - parse_firehose
      target: stdout

Version

vector 0.31.0

Debug Output

{"log":{"access_key":null,"message":"{ \"requestId\": \"ed4acda5-034f-9f42-bba1-f29aea6d7d8f\", \"timestamp\": 1578090901599, \"records\": [ { \"data\": { \"messageType\": \"DATA_MESSAGE\", \"owner\": \"123456789012\", \"logGroup\": \"log_group_name\", \"logStream\": \"log_stream_name\", \"subscriptionFilters\": [ \"subscription_filter_name\" ], \"logEvents\": [ { \"id\": \"0123456789012345678901234567890123456789012345\", \"timestamp\": 1510109208016, \"message\": \"log message 1\" }, { \"id\": \"0123456789012345678901234567890123456789012345\", \"timestamp\": 1510109208017, \"message\": \"log message 2\" } ] } } ] }\n","request_id":"123","source_arn":"hmm","source_type":"aws_kinesis_firehose","timestamp":"2020-01-03T22:35:01.599Z"}}

Example Data

data.json:

{
  "requestId": "ed4acda5-034f-9f42-bba1-f29aea6d7d8f",
  "timestamp": 1578090901599,
  "records": [
    { "data": "H4sICHogxGQAA3JlY29yZC5qc29uALWQS0vEMBSF9/6KkHULSaavuCtYBxeuOjsZym2TlML0YZIqMvjfTdpaFBFXEhLO5eOec3OvCGv5PEtjHwS+RViKCBoBcUgOkQq5ilhY10BDxThISEQqMoUDhG3Xux7oJ9dE4zQj3B0acx54w2bUwjjyhK4IC7DgtFOuxUArT2+T9Fl3+SmvHouyzI+F9xxfB6k9oOwQxUmaOUfmwWVsj3qcfZbXVeuLaoBebrS0WkL/ic1S7dzMtWl0N9luHO67i5V6He0bqNRC1iZ0Xm2LFznY/R/dsiDydbg/1c9VUUIJZyQjNAn2jWyTo61EFKP34D8y098zmctEZ3eX9+YDF8L9xhkCAAA=" }
  ]
}

Additional Context

No response

References

jszwedko commented 1 year ago

@tim-klarna just curious if I'm missing something obvious about how this is supposed to work 😄

objectbased commented 4 days ago

@jszwedko has there been any traction on this? I'm running into the same issue now where I want to collect from multiple firehose streams on a single port and use a transform with the aws_kinesis_firehose_access_key in the payload to parse properly, but am getting a null in the access key variable.

jszwedko commented 4 days ago

@jszwedko has there been any traction on this? I'm running into the same issue now where I want to collect from multiple firehose streams on a single port and use a transform with the aws_kinesis_firehose_access_key in the payload to parse properly, but am getting a null in the access key variable.

Apologies, I haven't been able to investigate this any further than the above. I'm still wondering if @tim-klarna can point out anything I'm missing.