vectordotdev / vrl

Vector Remap Language
Mozilla Public License 2.0
115 stars 52 forks source link

parse_syslog function ignores timezone when rfc3164 #853

Open timansky opened 1 month ago

timansky commented 1 month ago

A note for the community

Problem

Most devices are sending logs in rfc3164 format, which does not contain timezone. When syslog message in rfc3164 format, parse_syslog function always using UTC timezone when parsing timestamp

Configuration

transform:
  adapt_syslog:
    type: remap
    timezone: "Asia/Almaty" # Apply this timezone when parsing timestamp
    inputs:
    - syslog
    source:  |
      d = parse_timestamp!("May 19 2024 12:00:00", format: "%h %d %Y %T")
      log("debug1:" + format_timestamp!(d, format: "%+"))

      d = parse_syslog!(s'<180>May 19 2024 12:00:00 hostname message')
      log("debug2:" + format_timestamp!(d.timestamp, format: "%+"))

Version

0.38.0

Debug Output

INFO transform{component_kind="transform" component_id=adapt_syslog component_type=remap}: vrl::stdlib::log: Internal log [debug1:2024-05-19T07:00:00+00:00] is being suppressed to avoid flooding.

INFO transform{component_kind="transform" component_id=adapt_syslog component_type=remap}: vrl::stdlib::log: Internal log [debug2:2024-05-19T12:00:00+00:00] is being suppressed to avoid flooding.

Example Data

<180>May 19 2024 12:12:12 hostname message ### Additional Context _No response_ ### References _No response_
jszwedko commented 1 month ago

Thanks for this @timansky .

Internally, Vector stores all timestamps as UTC. The timestamp option just configures how the timestamp is interpreted if it is missing a timezone. See that the output:

INFO transform{component_kind="transform" component_id=adapt_syslog component_type=remap}: vrl::stdlib::log: Internal log [debug1:2024-05-19T07:00:00+00:00] is being suppressed to avoid flooding.

Is showing the correct timestamp in UTC for the given timestamp in Asia/Almaty which is +05:00.

https://github.com/vectordotdev/vector/issues/3333 is a ticket that is tracking actually storing the timezone of parsed timestamps rather than always normalizing to UTC.

I'll close this out since there doesn't seem to be a bug here, just a lack of https://github.com/vectordotdev/vector/issues/3333 being implemented, but let me know if I'm missing something!

timansky commented 1 month ago

I know that vector storing timestamp in UTC. But the behavior is different from what is described in the documentation. For example, syslog source parses timestamps with appling timezone if timezone missing in event and then converts to UTC. Also parse_timestamp applies but parse_syslog does not.

All this leads to the fact that we receive different data for the same event.

To reproduce it is just need to have timezone set and also 2 sources: 1 syslog 2 socket, then transform with parse_syslog

Example message: <180>May 19 2024 12:12:12 hostname message

Timestamp will be different

jszwedko commented 1 month ago

. Also parse_timestamp applies but parse_syslog does not.

Ah I see now. Let me reopen this and move it to the VRL repository.