magnusbaeck / logstash-filter-verifier

Apache License 2.0
192 stars 27 forks source link

Rolling year in dates #76

Closed srilumpa closed 4 years ago

srilumpa commented 4 years ago

Hi,

We have a parser that handles logs starting with a timestamp in, mainly, two formats : syslog (MMM d HH:mm:ss) and ISO8601 (yyyy-MM-dd'T'HH:mm:ss.SSSZ) and we parse those dates rather than using the one set by logstash upon log reception (we might have minutes even hours between log generation and reception on our hand). So, to ensure everything is working correctly, we set up something like the following testcases in our CI:

{
  [snip]
  "input": [
    "2019-12-12T13:58:25.123+0000 Some fancy log",
    "Dec 12 13:59:34.321+0000 Some other fancy log"
  ],
  "expected": [
    {
      "@timestamp": "2019-12-12T13:58:25.123Z",
      "message": "Some fancy log"
    },
    {
      "@timestamp": "2019-12-12T13:59:34.321Z",
      "message": "Some other fancy log"
    }
  ]
  [snip]
}

This is working great but we had "unexpected" failures in our CI mid of January when we updated part of the parser: the second log failed because the parsed date became "suddenly" 2020-12-12... instead of 2019-12-12..., which is not false since when there is no year in this kind of date, the systems interpret this as "current year".

We tackled this by changing our CI every time there was an implicit year, replacing it with the keyword <current_year> and we enforce a sed "s/\"@timestamp\":\"<current_year>/\"@timestamp\":\"$(date +'%Y')/g" before each check to be sure we have a current year correctly registered.

So we bypassed the issue and, yes, I know, this happens only once a year. But would it be possible to have that kind of behavior directly built in logstash-filter-verifier?

magnusbaeck commented 4 years ago

This is one reason I hate the classic syslog format. Timestamps without year and timezone are awful. Instead of adding a special case to cater to this exact problem, perhaps the solution I suggested in #75 would do? Then you could express that @timestamp must match the regexp "^202\d-12-12T13:59:34\.321Z$" and assume that the final digit in the year won't get screwed up.

srilumpa commented 4 years ago

I hate it also be, unfortunately, we are forced to cope with it :(.

Yes, using regular expressions to check the value of a field would also be a great (broader) solution! I'm closing this issue in favor of #75.