vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
18.11k stars 1.6k forks source link

remap .timezone config is templatable #15925

Open jdef opened 1 year ago

jdef commented 1 year ago

A note for the community

Use Cases

parse_timestamp can parse times that don't contain an explicit zone by using the value of timezone (transform/remap config field) falling back to the global timezone option if needed. the log files I need to parse have various timestamp formats within the same log file, some of which do not include timezone - and so I'd like to be able to derive a value for the aforementioned timezone transform option from some (a) non-standard environment variable; (b) the log file path; and (c) the log event itself.

it's very difficult to change the application in question. it's even harder to effect a change of environment variables used to spawn SRE-managed processes (like vector). vector already allows templated configuration {{..}} for other fields, but for some reason, not timezone. if I had the ability to change the environment in which vector was spawned, then I'd simply use ${MYENVVAR} semantics, but sadly that is not the case.

Attempted Solutions

I've tested with 0.25.1 and the nightly build from 2013-01-12 and both stop me from using {{...}} templating in the timezone config field for the remap transform.

Proposal

Proposal A (most flexible):

[transforms.with_tz]
type = "remap"
inputs = ["logfile"]
source = '''
.tz = ... # parse this from the log line, log file name, or munge some envvar into a form that makes sense for vector
'''

[transforms.enriched]
type="remap"
inputs = ["with_tz"]
timezone = "{{.tz}}"
source='''
. |= parse_timestamp(...) # uses default timezone (for times w/o zones) that was previously derived in "with_tz" section
'''

Proposal B (alternative that would still meet my use case): another option would be to support env variable value munging ala Bash-like syntax, e.g. ${ENVVAR##foo} or ${ENVVAR%%bar} or ${ENVVAR/foo/bar} so that simple prefix/suffix/replacement transforms could yield more workable values for config options derived from the environment.

References

No response

Version

0.27.0

jszwedko commented 1 year ago

Thanks for this request @jdef ! As an alternative, would adding a default_timezone parameter to parse_timestamp to be used if the timestamp lacks one, work for you?

jdef commented 1 year ago

yes, that would work as well

On Thu, Jan 12, 2023 at 9:38 AM Jesse Szwedko @.***> wrote:

Thanks for this request @jdef https://github.com/jdef ! As an alternative, would adding a default_timezone parameter to parse_timestamp to be used if the timestamp lacks one, work for you?

— Reply to this email directly, view it on GitHub https://github.com/vectordotdev/vector/issues/15925#issuecomment-1380461989, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR5KLBXQT2PDJ6H6LX2SMLWSAJPVANCNFSM6AAAAAATZKCECE . You are receiving this because you were mentioned.Message ID: @.***>

-- James DeFelice

hhromic commented 1 year ago

This is a nice feature request! We also have faced this limitation with Vector and solved it like this:

.eventTime = to_unix_timestamp(parse_timestamp!(string!(.time) + " +0000", format:"%Y-%m-%d %H:%M:%S %z"), unit: "milliseconds")

As you can see, we are just hard-coding +0000 as the timezone offset of UTC. That hard-coded timezone could be computed using VRL but of course turns quite ugly/hacky.

A solution like a default_timezone parameter would be definitively very elegant and handy!

jdef commented 1 year ago

To be clear, for our use case, an optional default_timezone param would accept a format like America/Chicago. Though I can see where accepting a format like [+-]\d{4} could also be useful for folks.

hhromic commented 1 year ago

A named timezone would also work fine for us, we just need to ensure that datetimes are UTC-based. We used +0000 just to make the %z format scanner happy.