logstash-plugins / logstash-filter-date

Apache License 2.0
7 stars 43 forks source link

Date filter fails to parse timestamps #95

Open magnusbaeck opened 7 years ago

magnusbaeck commented 7 years ago

When using the jdbc input to fetch events from a database, timestamp fields end up as timestamp fields in Logstash. One could argue that this is a feature, but it causes confusion since those fields apparently can't be processed by the date filter. Could we either call to_s on the source string or check if the source already is a timestamp and, if so, just copy it to the destination field?

See https://discuss.elastic.co/t/trouble-matching-timestamp/83768 for an example.

jordansissel commented 7 years ago

Hmm.. If it's already a timestamp type then maybe we should add a way to copy field values around? I'm not sure about having the date filter do this because it would focus on copying only fields with a timestamp type.

Alternately, maybe have the jdbc input allow users to specify what column should provide the @timestamp value?

I agree this is funky behavior and that we should make it possible (and easy) to do what you are describing.

Thoughts?

magnusbaeck commented 7 years ago

Hmm.. If it's already a timestamp type then maybe we should add a way to copy field values around?

Or not have timestamp types at all? I like to think of Logstash as a JSON processor, and a timestamp type just doesn't fit in. Internally it's of course fine if the values are stored as timestamps, but when that is exposed to users it easily leads to confusion:

magnusbaeck commented 7 years ago

Another user having a bad time: https://discuss.elastic.co/t/multiple-problems-with-logstash/90554

magnusbaeck commented 7 years ago

And again: https://discuss.elastic.co/t/replace-timestamp-by-a-other-field/92488

magnusbaeck commented 7 years ago

https://discuss.elastic.co/t/logstash-date-parse-failure-for-jdbc-input/92713

jordansissel commented 7 years ago

I'm thinking about how we can make this not a problem for users.

My thought is that you shouldn't even want to use the Date filter, you should just be able to have the JDBC plugin store the timestamp in @timestamp -- thoughts?

magnusbaeck commented 7 years ago

My thought is that you shouldn't even want to use the Date filter, you should just be able to have the JDBC plugin store the timestamp in @timestamp -- thoughts?

How would the plugin know which column to store in @timestamp? Should it require the query to return a result set with a column named @timestamp? Such a column nanme wouldn't be acceptable to PostgreSQL and probably other RDBMSes.

What if the result set contains multiple columns that should be stored as timestamps in ES?

dmarkhas commented 6 years ago

What's wrong with the original suggestion? " check if the source already is a timestamp and, if so, just copy it to the destination field?"

I just hit this myself and took me a while to figure out what was going on..

merrillbeth commented 6 years ago

I've hit this and am not able to get my log timestamps to replace the timestamp attribute. https://discuss.elastic.co/t/date-filter/118128/6

guyboertje commented 6 years ago

@magnusbaeck @jordansissel

The JDBC input will convert known Time-ish JDBC datatypes (Timestamp, Date) to Ruby Time objects. When these are added to the Event they are converted to internal Event::Timestamp instances. One of these may need to become the Event @timestamp.

There is one additional case to be considered - when the DB records times not in UTC. In this case one can set the jdbc_default_timezone to the TZ of the DB. Sequel then converts any of these known datatypes to UTC as it processes the recordset. Now if the tracking column is a timestamp Sequel must convert it back to the DB TZ before it can embed the Literal in the statement. There is a bug that I'm fixing right now in the jdbc input that causes sub-seconds to be dropped in this parameter conversion.

A workaround for the bug is to not set the jdbc_default_timezone and have Sequel work as if these columns are UTC meaning that the tracking_column parameter is used without TZ conversion. However this means that any or all Time-ish fields must be shifted from DB TZ to UTC in another filter. We do not have a filter that converts a Event::Timestamp to another TZ. Users have to resort to mutate/convert(string) before using the date filter.

JDBC input users often have to CAST their "timestamp" column to a string so they can use the date filter to set the @timestamp field.

OTOH we could extend the jdbc input to have the user specify which column goes into the @timestamp field and with the bug fixed people should have no reason not to use the jdbc_default_timezone setting.

However, I'm also open to extending the date filter to offer TZ conversion and @timestamp overwrite from existing Event::Timestamp fields.

LucidObscurity commented 6 years ago

In case anyone is struggling with this, the solution is below. The suggested workarounds on discuss.elastic.co don't quite cut it.

filter {
    mutate {
        convert => {"log_date" => "string"}
    }
    date {
        match => ["log_date", "ISO8601"]
    }
}

This converts the date from SQL to a string, then parses it and assigns it to @timestamp. log_date is the column name in your SQL table.

ip-aa commented 5 years ago

Thank you @LucidObscurity , your workaround worked perfectly. I know this is an old thread, but I just wasted a bunch of time on this; Couldn't someone just update the plugin documentation to include a note about this?

Constantin07 commented 3 years ago

The above workaround unfortunately doesn't work with JSON log files:

filter {
  mutate {
    convert => { "[json][timestamp]" => "string" }
  }

  date {
    match => [ "[json][timestamp]", "ISO8601" ]
  }
}

Any idea how to do it ?

magnusbaeck commented 3 years ago

@Constantin07 This issue only deals with events coming from the jdbc input (or similar inputs that produce timestamp values). Parsing timestamps from JSON files is an entirely different problem. Please ask for help with that at discuss.elastic.co.