Closed proligde closed 4 months ago
Hi!
I am having the same error:
Error: TZInfo::AmbiguousTime: 2015-10-25T02:10:03+00:00 is an ambiguous local time
It is quite a problem that this is causing logstash to restart endlessly.
As far as I understand the issue logstash-input-jdbc would have to let us configure (like tzinfo does it) what fallback (pick first or second point in time) to perform in case of an ambigous time stamp. As long as it doesn't to this it's somewhat "correct" to fail fatally.
Here is my understanding of the whole thing - please correct me if I'm wrong:
I know - that's all evil stuff and everyone should only write UTC timestamps to avoid such awkwardness but in reality there are a lot of databases out there logging stuff in local timezones that support daylight saving time shifts - so support for that would be great :-)
I totally agree with you (timezones are just so annoying to handle).
Probably the local_to_utc
conversion is global to all logstash plugins and not for the jdbc input only ? I don't know if it is a specific fix for this plugin or for logstash itslef.
Good point - I don't know either. If one of the contributors of this plugin could confirm or negate that we could keep this ticket here or move it to the proper elastic project instead.
I guess in the end there should be a global or plugin specific setting, leading to these calls:
tz.local_to_utc(d) { |periods| periods.first }
or
tz.local_to_utc(d) { |periods| periods.last }
as mentioned in the linked tzinfo issue.
I have the same issue here. I am importing data from a database with local timestamps. Even handling the exception and returning a _dateparsefailure in logstash would help. Then we could redirect the output to a file an parse accordingly.
Coming back to this issue, I think that the problem comes from the named_timezone
extension used by Sequel
.
As you can see here:
There is an attribute tzinfo_disambiguator
which could be used to remove the ambiguity by choosing wich period to use periods.first
or periods.last
in the application.
If I could send a PR I would, but I am really unfamiliar with ruby language so I hope this will help to fix this issue.
Any update or workaround for this? My import job is now stuck with this error...
we are using the following workaround to convert CET dates without timezone to UTC in the SQL statement (running against oracle database):
select from_tz(cast(<column> as timestamp), 'CET') at time zone ('UTC') "@timestamp"
from <table>
where ...
then ensure that timezone setting of the local machine or "jdbc_default_timezone" is UTC
maybe something similar is possible with MySQL?
Unfortunately, the setting linked for supplying a tzinfo_disambiguator
is a global setting for Sequel's named timezone extension, which means we don't have a way of configuring a particular input's handling.
In a brief spike, I have created a timezone proxy object using SimpleDelegator
, whose period_for_local
has an implicit preference for DST being either enabled or disabled, which seems to work:
# Quacks like a Timezone
class TimezoneWrapper < SimpleDelegator
##
# Wraps a ruby timezone object in an object that has an explicit preference in time conversions
# either for or against having DST enabled.
#
# @param timezone [String,TZInfo::Timezone]
# @param dst_enabled_on_overlap [Boolean] (default: nil) when encountering an ambiguous time,
# declare a preference for selecting the option with DST either enabled or disabled.
def self.wrap(timezone, dst_enabled_on_overlap)
timezone = TZInfo::Timezone.get(timezone) if timezone.kind_of?(String)
new(timezone, dst_enabled_on_overlap)
end
##
# @api private
def initialize(timezone, dst_enabled_on_overlap)
super(timezone) # SimpleDelegator
@dst_enabled_on_overlap = dst_enabled_on_overlap
end
##
# @override `Timezone#period_for_local`
# inject an implicit preference for DST being either enabled or disabled if called
# without an explicit preference
def period_for_local(value, dst_enabled_on_overlap=nil, &global_disambiguator)
dst_enabled_on_overlap = @dst_enabled_on_overlap if dst_enabled_on_overlap.nil?
__getobj__.period_for_local(value, dst_enabled_on_overlap, &global_disambiguator)
end
end
In this way, we could set the Database#timezone=(tz)
with our proxy object, which would resolve ambiguous timestamps as configured:
if @jdbc_default_timezone
@database.timezone = TimezoneWrapper.wrap(@jdbc_default_timezone, @jdbc_default_timezone_dst_enabled_on_overlap)
end
A couple open questions:
jdbc_default_timezone_dst_enabled_on_overlap
is a bit cumbersome.(cc: @guyboertje @jakelandis)
I agree with @mbertani: Would help a lot just to get a proper exception and a _dateparsefailure or something similar. Now, logstash silently crashes and drops events. Took me a long time to figure out what was going on.
I don't really understand why this is not labeled as a bug. If I understand this correctly, the jdbc_default_timezone cannot be used in countries using daylight saving time.
@marbin
The bug only surfaces once a year, when setting the jdbc_default_timezone
and the input is processing database records that have timestamps recorded during the change from daylight savings and the timezone in use has ambiguous results, i.e. ones that observe daylight savings.
As @yaauie points out, the setting that is made available for us to change in Sequel (the 3rd party ORM) that we use, is a global setting meaning that all jdbc inputs, jdbc_streaming and jdbc_static filters in a LS instance (across all pipelines) will have the same disambiguation set. Due to this global effect, setting a timezone in one plugin affects all plugins.
Unfortunately, either way the disambiguation proc chooses (first or last) will create some timestamps that overlap - there is no way to avoid this.
@guyboertje You mean that just because the bug only occurs in timezones with daylight savings, i.e. all of Europe, US, Australia etc and just once a year, it is not a bug ;-) The millennium bugs were no bugs because they only occured once in a millennium? ;-)
Joking aside, In my case, we had several pipelines that just crashed and it was pretty tricky to figure out what was going on, since there were no error messages in the logs. Had to turn on debugging to find it. Then I had to hack the last_run_metadata to skip the events that occured during the hour of the daylight saving time change, losing all these events. And all these pipelines were down until we had found and fixed the issue.
I understand this is a hard one to fix, but I think it would make sense to track it as a bug and maybe a warning in the docs for the jdbc_default_timezone feature, since using it with a timezone with daylight saving WILL crash your pipeline.
@marbin There is a bug or three flying about somewhere. The 'enhancement' here plus the global nature of the sequel library plus the scheduler also has problems with TZ bugs (some other plugins have pinned rufus/scheduler at an old old version and we need a big test to see what will fail if we remove the pinning). I'm not saying its not a bug.
We do have plans to switch all our jdbc plugins to a common reusable Java framework based around HikariCP, Vanilla JDBC and Quartz but this is a lot of work but we feel there are many niggling faults with the current code across all jdbc plugins that warrant the effort.
Therefore, fixing these issues and having the current set of tests pass is the goal of the move to Java project.
This "bug" doesn't only happen if you have data with a date in this 1 hour interval, it also does it you have a job that runs in that hour using a datefield as the index column.
If this happens, then an ambiguous date is stored in the meta file and that job will fail until you manually change / clear that meta file.
I am having this issue now. Any changes? Any other workarounds?
Same for me. Any update on this ?
hello, any news ?
Same here. any workaround example for mysql 5.5?
Hello new hit on that bug tonight, no progress on the AmbiguousTime problem ?
Problem occurs here as well, no logs are loaded since hour change, whats the progress on this issue?
I have encountered the same issue, any news on when this is going to be resolved? There is already a PR waiting for merge in the repo of the integration plugin: https://github.com/logstash-plugins/logstash-integration-jdbc/pull/70
As long as jdbc crashes on this, for MySQL, you can try to avoid the issue like this :
statement => "SELECT CONVERT_TZ(date_created, 'Europe/Paris', 'UTC') AS date_created FROM your_table
jdbc_default_timezone => "GMT"
(You'll need to populate tz infos of your mysql db first if it's not already the case with : mysql_tzinfo_to_sql /usr/share/zoneinfo | mysql -u root mysql)
CONVERT_TZ should handle properly this annoying AmbiguousTime Error
Any updates on this? Unfortunately it's currently not really feasible for us to convert everything to UTC.
Since v5.4.0 of this plugin (which is now distributed as a part of the JDBC Integration plugin for Logstash), you can specify the behaviour when encountering an ambiguous timestamp:
While it is common to store local times in SQL’s timestamp column type, many timezones change their offset during the course of a calendar year and therefore cannot be used with SQL’s timestamp type to represent an ordered, continuous timeline. For example in the
America/Chicago
zone when daylight saving time (DST) ends in the autumn, the clock rolls from01:59:59
back to01:00:00
, making any timestamp in the 2-hour period between01:00:00CDT
and02:00:00CST
on that day ambiguous.When encountering an ambiguous timestamp caused by a DST transition, the query will fail unless the timezone specified here includes a square-bracketed instruction for how to handle overlapping periods (such as:
America/Chicago[dst_enabled_on_overlap:true]
orAustralia/Melbourne[dst_enabled_on_overlap:false]
).-- Logstash JDBC Input Plugin --
jdbc_default_timezone
-- Ambiguous Timestamps
Versions of this plugin with this feature were first shipped with Logstash v8.6.0.
Logstash 7.6+ and 8.0+ shipped with the JDBC Integration plugin, so it can be updated with the normal method:
bin/logstash-plugin update logstash-integration-jdbc
Logstash < 7.5 will need to remove the stand-alone JDBC plugins prior to installing the integration plugin, which will look something like:
bin/logstash-plugin remove logstash-input-jdbc
bin/logstash-plugin remove logstash-filter-jdbc_static
bin/logstash-plugin remove logstash-filter-jdbc_streaming
bin/logstash-plugin install logstash-integration-jdbc
@yaauie Thank you so much. That helps a lot, unfortunately because of another issue in Elasticsearch itself, we cannot upgrade to Logstash 8.6.0 just now, but as you pointed out, I can just upgrade the integration-plugin in Logstash 8.3.3
Hi there! First of all - thanks a lot for this plugin - it's great!
However, when using my MySQL DB's timestamp in elasticsearch/kibana I noticed it was off by 1h since the MySQL timestamp is interpreted as UTC but in fact is CET.
So i added: jdbc_default_timezone => "Europe/Berlin"
But now my logstash-pipeline crashes fatal resulting in an endless loop of trying to read data from MySQL with this error:
I'm aware that this might probably more like an tzinfo related problem (https://github.com/tzinfo/tzinfo/issues/32) but I don't see any other possibility to make it work correctly.
Exporting a UNIX-timestamp in MySQL and using a date input filter combined with the "unix" timestamp parser is (for whatever reason) painfully slow resulting in jammed elasticsearch input queues.
Best - Max