Open webmat opened 6 years ago
Just for reference, this issue happens because sprintf time interpolation behaves differently from normal interpolation:
2.3.0 :006 > e = LogStash::Event.new; e.sprintf("%{[hello]}")
=> "%{[hello]}"
2.3.0 :007 > e = LogStash::Event.new; e.remove("@timestamp"); e.sprintf("%{+YYYY}")
LogStash::Error: timestamp field is missing
from org/logstash/ext/JrubyEventExtLibrary.java:168:in `sprintf'
from (irb):7:in `<eval>'
In case anyone is watching this issue, I just posted a massive update to the description of the issue. The focus is now solely on @timestamp
, to keep the discussion focused on the problem at hand. Mentions of @version
have been removed.
Also, I brought back my original approach to the matter on the table (trying to accommodate for the missing @timestamp
). After initial feedback, I thought this approach was actually not gonna fly with anyone. But after the short discussion in the ingest meeting, I realized that I may not be alone in thinking that's a valid approach, after all :-)
In my mind, approaches 1 and 3 are tied as my favourites. I'd start with 1 because it's less work (it's essentially the current state + possible improvements), then if we want, we can continue working towards option 3.
My 3rd favourite approach is # 2, but I can live with it. However if we want to go the opinionated route, we'll have to be clear with our users that use cases removing @timestamp
will no longer be accepted, and that this behaviour will be getting more severe with upcoming versions.
Thanks for the excellent writeup @webmat .
Whatever solution we come up with here, we should consider consistency with sprintf
generically as @jsvd mentioned.
My suspicion here is that there is no one right answer here. What are your thoughts WRT creating two separate sprintf
impls for plugins to use sprintf_strict
and sprintf_lax
, where sprintf_strict
raises exceptions on missing fields (at which point it could be DLQed or whatever the author decides), while sprintf_lax
always succeeds, but may include #{...}
. This would mean that there would be consistency based on function call.
You'd use sprintf_strict
for things like the ES index name interpolation, and sprintf_lax
for everything else. We'd obsolete plain sprintf
in a major.
@timestamp
is a normal field in all respect (it can be removed) for the exception of having a specific construction & coercions behaviour : upon the creation of a new Event
a @timestamp
is always created so you can rely of the fact that all new Event
always have a @timestamp
.@timestamp
to be present and consider @timestamp
a special field. This is fine because @timestamp
is usually central to most configurations. Event
field the way they want without exception.@timestamp
as it is now.Event
"as a whole" typically at the outputs. @timestamp
field so that it does not show up when either serializing (Event.toJson
) or retrieving the event fields/values collection (Event.toMap
, Event.getData
) as intended when removing a field.I suggest we introduce a behavioral exception for the @timestamp
field since it IS a special field in a way, for its behaviour at Event
construction and also for its coercion behaviour. In fact internally the @timestamp
value as its own class implementation.
Removing @timestamp
would work as intented BUT querying the field value from sprintf
/interpolation and from field reference would always return its value regardless if was removed or not. This would allow users to shape the Event
the way they want but would also never break features or plugins that rely on @timestamp
to be present.
WDYT?
@colinsurprenant I agree that @timestamp
should be removable.
I'm worried about that approach due to the amount of magic there, it seems like it violates the principle of least surprise if you can remove something but it still exists in some contexts.
What did you think of my proposal to let plugin authors decide what level of strictness was needed for interpolation on a case by case basis?
@andrewvc seems to add more complexity to me. OTOH what I propose is basically surfacing a fact that is true today which is that the timestamp has some special behaviour: we even have a Event.getTimestamp
method. I thins this can be documented and for the principle of least surprise I think we could argue that my proposal does satisfy it since the intention is not to have the field appear in its serialized form which is the majority of cases where a field is removed and no change would be required in plugins.
It seems to me that conceptually it will be easy to explain and understand for the users that there is one special field in the Event
and it has to do with the timestamp handling and the reason is because some features and plugins expect a timestamp to be present but regardless, this field does not need to be included in the event serialized form, by removing it.
@colinsurprenan That's a good point. I see that it does solve the problems you mention. It does have the cost of being yet another 'special' thing to document.
Here's a thought, what if we deprecate @timestamp
and say that the canonical place for it is [@metadata][timestamp]
, which doesn't serialize by default. @metadata
would then be the single place where you put data that doesn't serialize. Thoughts?
We'd also adjust sprintf for timestamps to treat that like an ordinary field.
To clarify what I mean by 'like an ordinary field', I mean that we'd follow the same interpolation rules. I do think we need a strict interpolation variant regardless of the timestamp decision to let plugin authors decide if interpolation issues are fatal.
An alternative approach would be if we had a rule that any field that starts with @
is not serialized. That would be another way to have simple event behavior without multiple rules.
Having the timestamp in [@metadata]
is also worth considering and was basically what I verbally suggested in the last sync. The disadvantage is that it does break the current behaviour and will require ajustements in various places and this is why after giving it some thoughts that I did not suggest that here.
The problem we face is that over time @timestamp
has become a special case field and rightfully so because we enforce the fact that an Event
is always initialized with a valid timestamp and timestamp handling has always been considered "a special case in itself" to deal with. Many plugins reference @timestamp
is various forms (as a hardcoded reference, as a default configuration option, etc) and changing the interface to the Event
timestamp will certainly require lots of code change and more explanation to the users.
This is why I still think that we should not change the interface to get the Event
timestamp using the [@timestamp]
field ref and it should simply always return a value but provide the ability to not include it in a serialized form by using remove_field
. Again, the only exception this introduces is that after doing a remove_field
on [@timestamp]
you'd still get a valid value because we'd now consider [@timestamp]
the always working interface to the Event
timestamp and introducing this single exception would probably not break anything or require changes. For the vast majority of cases this would be seamless for the users because it would satisfy the intent. It may surprise a few users that actually manage to test that [@timestamp]
is correctly removed after a remove_field
on it but frankly I think this will be exceptional and they'd only need to rework that test to verify the serialized output and not the field ref for [@timestamp]
.
This comment is not a response to the recent comments (that's coming after).
If anyone strongly in favour of Option 2 wants to flesh out this section more, in the issue body, you're welcome to do so. It's never how I used Logstash personally. So while I understand the point of view, I'm probably not making all of the arguments that can be made.
I feel like the current discussion always goes really deep right away.
But I'd like to shed light on one assumption that seems to be underlying all of this. That @timestamp
is special, and that "things will break" if it's not there (these quotes represent a cloud of mystery :-) ).
I'd like to flesh out clearly what it is that breaks. I can start with the obvious one, which triggered all of this.
Breaks:
@timestamp
to a timestamped index name will fail.
1.1. This applies to output-es
, and possibly other plugins that need @timestamp
to figure out what to do with the event.If we fix this (assuming we try the lenient route), what else could this break? Is this it, or are there other things I'm not seeing?
@andrewvc If we were to introduce new sprintf
implementations, perhaps we could consider only introducing a strict one.
Right now the behaviour is:
%{...}
interpolation, aka field interpolation%{+...}
interpolation, aka time format interpolationThe new behaviour, if we take that route could be:
sprintf
becomes lax for both types of interpolation (least breaking change)sprintf_strict
is strict, and can be used for field or time format interpolation as well. Plugin authors may or may not decide to start using it.----------
Reset.
Now that I phrase it this way, I wonder if multiple implementations of sprintf
are a solution to this. Part of this discussion is "should these situations crash Logstash or not". Presumably, Logstash users are the ones who should decide this, rather than us.
Introducing multiple versions of sprintf
would mean that potentially many plugins need to be updated the right way, to actually give Logstash users this choice. Plugin needs to 1) start using both versions of sprintf
2) give users an option to decide severity.
Perhaps we should be considering adding a Logstash-level option to decide behaviour on interpolation failures, instead?
Options:
foo
, would leave %{foo}
in the output)The behaviour we have in current versions of Logtstash (at least wrt to output-es
) is:
# logstash.yml
interpolation.field: none
interpolation.time: crash
Given the different role of different plugins, interpolation_field
and interpolation_time
could also become Common Options, where users can selectively override the behaviour. This way someone building a really strict pipeline can still get leniency around some parts of their pipelines.
I haven't fully thought through the implications here. But I think this more directly addresses "should we crash" or not. With this approach, the answer becomes the beloved "It Depends ™" (but you decide).
We have definitely broadened the discussion away from only the output-es problem, but that's a good thing. Maybe we should move the discussion into a logstash issue.
Generally speaking when I referenced things that will break, in the context of trying to fix the larger issue by getting rid of the @timestamp
field I meant that:
@timestamp
will not work and will need refactoring/updating.@timestamp
either as a hardcoded field reference of default config option to get or set the event timestamp would need to be updated. This is why I believe that the @timestamp
field ref as the interface to read/write the Event
timestamp should be kept as-is and always working regardless if remove_field
was done on it.
One thing's for sure, moving @timestamp
to @metadata.timestamp
would be a breaking change that affects a lot of plugins. The sum of plugins that would suffer from this would be a big hindrance for people considering upgrading. We may be diligent in fixing all supported plugins, but a portion of the unsupported plugins would be forever left behind...
@colinsurprenant If we made @timestamp
resilient to deletion by hiding it, what would the behaviour be, here? ;-)
if [@timestamp] {
# ...
} else {
# ...
}
OK, so new thought having chewed through all the excellent points both @colinsurprenant and @webmat have made.
I particularly like your proposal for per-field support @webmat but I think doing an extra file is awkward. What if we implemented it at the LS config language level? What if say %{}
acted as it does today, but #{}
was the strict variant? That puts all the power into the hands of the user. For timestamps, you'd either write %{+...}
or #{+...}
to get lax or strict variants. To solve the serialization issue, we could extend the syntax to support timestamp interpolation of arbitrary fields, so you could do #{[@metadata][timestamp]+yy.mm.dd)
to get strict interpolation of a non-serialized field, for instance.
So, the guidance for a user not wanting to serialize the timestamp, but to use it, would be to use that sprintf format, and to move the timestamp from @timestamp
to @metadata
, which is already the place where things are not serialized.
Yes! I like the idea of having two ways of interpolation. A strict one and a lax one.
Tongue in cheek caveat: Now Rubyists will get confused by two things, rather than one. The hash syntax without the commas, and now the look-alike string interpolation delimiters that actually have no relation with Ruby's 😆
Kidding aside, I like this. It gives maximum granularity (both can even be used in one string template, at the extreme), without introducing anything scattered all over the place (logstash.yml, plugin args). It's just two different delimiters. Love it!
@webmat that's a good point, we already have %{}
and ${}
, so adding a third character is inherently tricky. I suppose it could be any of @^&#*~
as a sigil. I don't have a particular preference. It's too bad $
is taken, because you could remember it as $trict. WDYT?
The strict behavior, BTW, should be to either log and warn, or go to the DLQ IMHO.
@andrewvc @webmat hypothetical question: let's say that @timestamp
is always readable, would the strict interpolation still be necessary/useful?
@webmat the behavior with
if [@timestamp] {
# ...
} else {
# ...
}
would be to never take the else
branch obviously. I'd be extremely surprised to see anything like that in the wild though because first [@timestamp]
always exist at Event
creation and second someone removing @timestamp
does so for it to not be part of the serialized output so there is no real point in doing such a check in a config.
@colinsurprenant I think it's a useful feature on its own yes. It'd actually still be useful for @timestamp
fields.
Let's imagine a situation where the real timestamp is in [custom][foo]
. The user might have logic like overwrite @timestamp with the value in [custom][foo]
, if that is null, however, and we always guarantee a timestamp, what then? Do we just not overwrite it? Log a warning?
The exciting thing about the approach of having the user pick the syntax is that it puts the choice of behavior in the hands of the user. It also is a generally useful enhancement, which is great because it reduces the total complexity of the system. Most importantly, however, it keeps the usage of field removal/writing consistent. I'm not against having exceptional cases, but exceptional cases often indicate an impedance mismatch in the modeling of a problem. If we can solve a problem without that and put the user in control, that feels like a win.
@andrewvc lets keep in mind that we already have a different behaviour for @timestamp
handling:
Timestamp
object at Event
creation.@timestamp
field.@timestamp
IS the interface to a "always valid" Timestamp
object. Using any other field does not guarantee that its value will be a valid Timestamp
. @colinsurprenant it makes sense to reduce the number of special cases, and the amount of complexity. We should probably remove all those special guarantees. IMHO we should just make @timestamp
a normal field, and make the auto-population of it configurable.
Let's imagine we're teaching a course on Logstash, we could either explain:
@timestamp
for you. @timestamp
. This field is not removable and can only contain a Timestamp
type, setting it to anything else is an error. If you invoke a remove on the field it will still be present, for reads, but will not be serialized.Moving to it being a normal field would actually clean up a lot of weirdness in how examples work. In some examples we read a string version of a timestamp into [timestamp]
, and then use the date filter to move it into [@timestamp]
, then finally delete [timestamp]
. It'd make more sense to read the string straight into [@timestamp]
and then immediately convert the type in place with the date filter IMHO.
One other thought, have you considered that the 'hide-able' approach makes the logical nullity of timestamps weird to express. If you want to express "I don't have a timestamp" then you need to make some other field for that called [no_timestamp]
. Imagine a user who expects the origin of requests to always provide the timestamp. Our 'default' behavior then gets in the way and they need to hack around it.
I guess It is a question of how much we need to change to solve which scope of problem, in what timeframe taking into account the BWC issues. The point of my proposed solution is to be simple, applicable today, backward compatible and does not require any modifications in the plugins or user configurations. But the tradeoff is to have a exceptional behaviour when removing the @timestamp
field (which I tried to show it is a minor tradeoff IMO because the intent of removing that field is preserved, but that fine if opinions diverge here).
It's all good if we agree to go wide and completely redefine the timestamp handling and behaviour in logstash and plugins too.
Considering logstash 7.0 is far away but not too much, this is a great time to be considering breaking changes and the upgrade+deprecation path for them in the 6.4/6.5 minors.
IMO, I think that moving the timestamp field to the @metadata
(or another meta namespace) and making all sprintf calls lax would remove surprises in sprintf behaviour, solve the default serialization issue and keep the current timestamp related apis in the event class working as before.
BWC-wise, this shouldn't break almost any plugin, as they should be using event.timestamp =
or event.set_timestamp
anyway. Grepping for @timestamp
on our default plugins shows a very small list of hits.
One thing we'd have to consider is the default schema we push to elasticsearch, that currently contains @timestamp
, but that could be solved at the es output plugin level.
Tbh I don't see an urgency is solving this in any immediate time frame, this situation pops up once in a while and we could just improve the wording in the exception we output, but otherwise we have the breathing room to make "the right decision" without pressure to deliver a fix.
Events without
@timestamp
can be difficult to process in certain situations. According to code that dates back to 2013 (and probably before), its presence should be guaranteed. Logstash has never enforced this by preventing the removal of@timestamp
per se, however.So whenever a problem arises from a missing
@timestamp
, people may simply see an innocuous error like the following, and conclude that there's a bug in Logstash:One point of view about this is that the event should actually be considered invalid and whatever lead to this situation is likely to cause data loss.
There are two schools of thoughts or approaches we can take for these:
In either case, in order to prevent needless confusion and support requests, we should at the very least output a log message that's a bit more verbose than the one above, in explaining what the problem actually is. With this event it's missing
@timestamp
and the customer tries to index it to a timestamped index (likelogstash-%{+YYYY.MM.dd}
).Are non-timestamped events legitimate?
The behaviour currently (unless the above situation happens) is to successfully index the event. I would argue there's a legitimate use case for removing
@timestamp
in Logstash pipelines. It's counterintuitive because Logstash was born to process time-based data. But once someone has a Logstash installation, Logstash is a great way to ingest non time-based data periodically to their ElasticSearch cluster.Examples of non time-based data to reimport regularly:
Another supporting argument for the legitimacy of non time-based support in Logstash is the fact that our plugins are gradually getting better at supporting non time-sensitive scenarios:
The last point in demonstrating the legitimacy of processing non-timestamped data with Logstash is the very first Logstash configuration one sees, during the ElasticSearch training. Here we're ingesting bunch of data, but we're not interested in keeping
@timestamp
(among others).Approach 1: Support these events as best as possible
The current behaviour is that events with no
@timestamps
work in most cases. People can do this, and are doing it. There are still some improvements we can make:@timestamp
to a timestamped index.ERROR
for every event in this situation, and keep Logstash running.Approach 2: Interpret this as a problem with the user's configuration and crash
A lot of the feedback around #777 was not about the PR, but went in the direction that "not having a
@timestamp
on the event" was not officially supported. So this definitely stuck a chord with a few people :-)In any case, nobody denies that we can improve the error message when the timestamp error happens.
Now, do we want to take a harder line to the effect that we do not want to support a missing
@timestamp
at all? If so, the current behaviour is not strict enough. We may want to take the following additional steps:@timestamp
as soon as we see them (whether or not the timestamp is needed).@timestamp
.1+2=3: Let users decide
Any given user may really want Logstash to behave like 1, or to behave like 2. Presumably, the default could be the friendlier approach 1, and people who take great pains to construct a pipeline that has the least data loss possible (e.g. use only non-lossy transports & plugins, careful use of PQ & DLQ) could optionally decide that such events should indeed cause a hard crash.
Why this discussion?
This issue came out of the discussion here #777, which in turn came from reports such as #739. Requesting feedback on #777 in Slack immediately led to discussions of going towards approach #2.