vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.46k stars 1.53k forks source link

Vector should reread AWS_SHARED_CREDENTIALS_FILE on SIGHUP #7013

Open mcgfenwick opened 3 years ago

mcgfenwick commented 3 years ago

Vector Version

vector 0.12.1 (v0.12.1 x86_64-unknown-linux-gnu 2021-03-12)

Vector Configuration File

[sinks.aws_bar_aws_s3]
  type = "aws_s3"
  bucket = "our_bucket"
  compression = "gzip"
  healthcheck = true
  region = "us-west-2"
  batch.max_size = 1049000000
  batch.timeout_secs = 60
  buffer.type = "memory"
  encoding.codec = "ndjson"
  filename_append_uuid = false
  filename_time_format =""
  key_prefix = "bar_foo/%Y/%m/%d/%H/%M/${HOSTNAME}"
  filename_extension = "0.json.gz"
  inputs = ["bar_aws_s3_source"]

Environment="AWS_SHARED_CREDENTIALS_FILE=/etc/vector/creds2"

# cat /etc/vector/creds2
[default]
AWS_SECRET_ACCESS_KEY=ZZZZZZZZZZZZZ
AWS_SESSION_TOKEN=REALLYLONGSESSIONKEYSTRING
AWS_ACCESS_KEY_ID=XXXXXXXXXXXXX
AWS_CREDENTIAL_EXPIRATION=2021-04-04T03:19:21.000Z

Debug Output

Expected Behavior

Actual Behavior

Example Data

Additional Context

References

mcgfenwick commented 3 years ago

Is this issue resolved in: vector 0.13.0 (v0.13.0 x86_64-unknown-linux-gnu 2021-04-21) I tried this version but I still have the problem.

jszwedko commented 3 years ago

Hi @mcgfenwick ! This issue is still outstanding, but we hope to get to it soon.

mcgfenwick commented 3 years ago

Oh great, I saw version 0.13.0 was released and jumped to a conclusion. BTW I'm happy to test a nightly too. Thanks.

ktff commented 3 years ago

@mcgfenwick was the Vector configuration also changed or only AWS_SHARED_CREDENTIALS_FILE and/or the contents of the file it's pointing to?

mcgfenwick commented 3 years ago

Sorry I completely missed this. Just the contents of the file pointed to by AWS_SHARED_CREDENTIALS_FILE is changing.

ktff commented 3 years ago

Then this is a feature. The reload is lazy in a sense that it won't shutdown and rebuild a component whose configuration in config files hasn't changed. It doesn't look for changes in environment variables nor files pointed by either environment variables or config files.

That said, this feature would be useful so I'll investigate what can be done.

mcgfenwick commented 3 years ago

I'm using this file to store AWS session keys, which expire every 8 hours.

ktff commented 3 years ago

Hmm, then is AWS_CREDENTIAL_EXPIRATION configured with that 8h? and when does the file get changed, before or after it expired?

mcgfenwick commented 3 years ago

This is what I have:

[default] AWS_SECRET_ACCESS_KEY=xxxx AWS_SESSION_TOKEN=yyyyyy AWS_ACCESS_KEY_ID=zzzzz AWS_CREDENTIAL_EXPIRATION=2021-05-31T01:51:27.000Z

This gets updated about 30 minutes before these creds expire.

ktff commented 3 years ago

Then SIGHUP/reload is not necessary. aws_s3 sink will, when it tries to use them, reload credentials if they have expired. That includes those in the file. We should mention this on the website but I'm not seeing it.

@mcgfenwick so you shouldn't need to do anything, or is there some other bug like it's using those expired credentials or something else?

jszwedko commented 3 years ago

Assuming they are automatically reloaded when expired, I think we can still leave this issue open, but de-prioritize it. As a user, I'd probably still expect Vector to reload that file on SIGHUP.

ktff commented 3 years ago

As a user, I'd probably still expect Vector to reload that file on SIGHUP.

True.

Then since we already have the concept and structs for expiration we can add a new one for reload. Such struct/wrapper would on each access check if it needs to reload underlying asset.

Now there are various ways of passing reload signal to such structs. The simplest and least intrusive is to have a global atomic counter whose value represents current reload generation starting from 0 and incrementing by one for each reload. Then the struct can have a simple load-eq check to determine should it reload. It can't get performance cheaper than this. And since this field represents the whole program for it's whole duration it shouldn't be problem of it being global so to avoid having to weave it into the function calls.

An alternative to this global field is passing individual fields/channels to components to have a finer control over reload process if we need to have it.

Also, to avoid increasing the complexity of whole reload process, the reload of underlying asset has best effort guarantee in a sense that a failure to reload it won't stop whole reload process and will instead use the old value and emit a warning.

de-prioritize it.

The above proposal should be straightforward and would open a way for partial reload of other things so I can at least push a draft for it.

jszwedko commented 3 years ago

👍 if you think you have a simple implementation path I'm for it. It would set the precedent for reloading other files on SIGHUP.