Open jplitza opened 7 years ago
Hi @jplitza,
thanks for reporting the issue. That sounds like very 'interesting' behavior of InnoDB. I think we should migrate the events
table to MyISAM, does anyone know any downsides?
Wow, that's crazy. You're totally correct. I had no idea. In general, InnoDB is more reliable than MyISAM, and has better transaction support. It'd be a shame to lose that.
It's a hack, but maybe we could check ID when we add a new Event?
e.g.
def build_event(event)
event = events.build(event) if event.is_a?(Hash)
event.agent = self
event.user = user
event.expires_at ||= new_event_expiration_date
event.id = Agent.maximum(:last_checked_event_id) + 1 if Event.exists?
event
end
@cantino You probably know better, but the code snippet you posted sounds like problems could still occur when the current (originating?) agent's last_checked_event_id is lower than the receiving agent's.
I haven't had a look at the code yet, but couldn't you "simply" add a fake event that will not really be processed by anything ever, just before pruning the events table? It may have an expiration date of +1m or something, but won't be purged by the current run (and the next will add its own fake event).
@jplitza Agent.maximum(:last_checked_event_id)
is the largest Event id of all Agents in the database, I think that might work.
@cantino Wouldn't we need to that especially for new events? As far as I remember setting the id like that does not work. Not sure if the model only needs to be persisted first or if only update_attribute
works, but I remember having issues messing with the id
. Probably because one normally should not do it 😄
An other option would be to update the last_checked_event_id
column of all Agents after purging the events.
def self.cleanup_expired!
affected_agents = Event.expired.group("agent_id").pluck(:agent_id)
Event.transaction do
Event.expired.delete_all
max_id = Event.maximum(:id)
Agent.where(id: affected_agents)
.update_all ["last_checked_event_id = ?,events_count = (select count(*) from events where agent_id = agents.id)", max_id]
end
end
@dsander, I think that would only be safe if you were emptying the Event table, right? Otherwise you could cause an Agent to skip over events it hadn't seen yet?
I'm not sure about setting id
. I'd hope it would work, as it's valid from the DB's perspective.
Keep the unprocessed events in the event table only.
When "propagate!" or "immediately propagate". Select all event belong this agent from event table , process, and then move to events_log.
Agent no count the maximum event id.
@cantino Right, the query would have to be specific for every Agent. It could probably be done in SQL as well but would be more complex.
We could also change cleanup_expired!
to only expire events that have been processed.
scope :expired, lambda {
where("expires_at IS NOT NULL AND expires_at < ? AND id > ?", Time.now, Agent.active.min(:last_checked_event_id))
}
But that would change unrelated logic
@MingShyanWei While in general that could work it is not really practical for Huginn because it would require a lot of changes.
We'd want to only look at last_checked_event_id on Agents that could receive the Event. It is complex. Maybe we should have a special placeholder event with some flag on it that we never delete or propagate? What a pain.
@cantino see eye to eye with sb.
@cantino We would need to move that event to a new id every time cleanup expired is run, right? Feels a bit hacky, but definitely more efficient then updating every Agent's last_checked_event_id
What do you all think about switching the manual installation guide and the multi-process docker image to postgres?
I'd be totally fine with switching to Postgres.
I also ran Huginn without issue on Mysql for years, so, while this is a problem, it's not a frequent one.
What about always keeping the latest event?
diff --git a/app/models/event.rb b/app/models/event.rb
index 410bc8b8..d8809cea 100644
--- a/app/models/event.rb
+++ b/app/models/event.rb
@@ -25,7 +25,7 @@ class Event < ActiveRecord::Base
after_create :possibly_propagate
scope :expired, lambda {
- where("expires_at IS NOT NULL AND expires_at < ?", Time.now)
+ where("expires_at IS NOT NULL AND expires_at < ?", Time.now).where.not(id: maximum(:id))
}
scope :with_location, -> {
I think that might work for the case of automatic cleanup, which would be a big step forward.
There are still other ways that the Event with the highest ID could get deleted, like if the user delets the Agent that created the highest Event, or if they delete the Event manually in the UI, but your solution would still fix the common case of automatic event deletion.
Should we add this?
OK, so the correct workaround would be to detect when the events table is empty as timely as possible and reset last_checked_event_id of all agents. Fortunately, there are only a few places in Huginn where events may get empty, so it won't be hard to do that, if I'm not missing something.
diff --git a/app/controllers/agents_controller.rb b/app/controllers/agents_controller.rb
index 1ae90763..8205f06a 100644
--- a/app/controllers/agents_controller.rb
+++ b/app/controllers/agents_controller.rb
@@ -210,6 +210,7 @@ class AgentsController < ApplicationController
def destroy_undefined
current_user.undefined_agents.destroy_all
+ Event.reset_agents_if_empty
redirect_back "All undefined Agents have been deleted."
end
diff --git a/app/models/agent.rb b/app/models/agent.rb
index aea2e11b..bdfd58c5 100644
--- a/app/models/agent.rb
+++ b/app/models/agent.rb
@@ -42,6 +42,7 @@ class Agent < ActiveRecord::Base
before_save :unschedule_if_cannot_schedule
before_create :set_last_checked_event_id
after_save :possibly_update_event_expirations
+ after_destroy { Event.reset_agents_if_empty }
belongs_to :user, :inverse_of => :agents
belongs_to :service, :inverse_of => :agents, optional: true
diff --git a/app/models/event.rb b/app/models/event.rb
index 410bc8b8..ee19c9a9 100644
--- a/app/models/event.rb
+++ b/app/models/event.rb
@@ -23,6 +23,7 @@ class Event < ActiveRecord::Base
after_create :update_agent_last_event_at
after_create :possibly_propagate
+ after_destroy { Event.reset_agents_if_empty }
scope :expired, lambda {
where("expires_at IS NOT NULL AND expires_at < ?", Time.now)
@@ -74,9 +75,16 @@ class Event < ActiveRecord::Base
def self.cleanup_expired!
affected_agents = Event.expired.group("agent_id").pluck(:agent_id)
Event.expired.delete_all
+ Event.reset_agents_if_empty
Agent.where(:id => affected_agents).update_all "events_count = (select count(*) from events where agent_id = agents.id)"
end
+ def self.reset_agents_if_empty
+ if !exists?
+ Agent.where.not(last_checked_event_id: nil).update_all(last_checked_event_id: nil)
+ end
+ end
+
protected
def update_agent_last_event_at
diff --git a/lib/agent_runner.rb b/lib/agent_runner.rb
index 815f0692..bac84ee8 100644
--- a/lib/agent_runner.rb
+++ b/lib/agent_runner.rb
@@ -35,6 +35,8 @@ class AgentRunner
end
def run
+ Event.reset_agents_if_empty
+
@running = true
run_workers
It looks like MySQL effectively sets the auto-increment counter to SELECT MAX(ai_col) FROM table_name
on restart. So, imagine that the Events table has a bunch of Events in it, then the newest set of events get expired and removed, but some old Events still remain (because their creating Agent isn't set to expire). Now, if the server is restarted, the events table won't be empty, but the auto-increment value would decrease to the current max(id)
in the table. This would break last_checked_event_id
values in the same way as when the auto-increment goes to 0 on an empty table, right?
So, I think the line of reasoning from your original solution may be a better one.
@knu I like your first suggestion.
Isn't your reset_agents_if_empty
function really reset_agents_if_not_empty
? I think that suffers the same race condition @cantino described here
I think the correct thing would be the reset_agents_if_empty
approach, but setting the last_checked_event_id
to maximum(:id)
. Right?
@jplitza We can't to that because it can lead to Events not being processed by the receiving Agent. Imagine an Event being emited, then we clean up the expired Events (and set last_checked_event_id
) now the receiving Agent will never process the new Event.
minimum(maximum(:id), :last_checked_event_id)
then.
This... is hard to think about. 😄 @jplitza, do you see any issues with @knu's current solution in #1974?
@jplitza Sadly it's not that easy we discussed it here: https://github.com/cantino/huginn/issues/1940#issuecomment-287651996 https://github.com/cantino/huginn/issues/1940#issuecomment-287662409 https://github.com/cantino/huginn/issues/1940#issuecomment-288034168
No, my proposal doesn't cause agents to skip events. I'm not confident with Rails, so let me give you plain SQL (probably not with the right column and table names though):
UPDATE agents SET last_checked_event_id = MIN(last_checked_event_id, (SELECT MAX(id) FROM events));
This will
last_checked_event_id
, thus never skip any already present, yet unprocessed events.last_checked_event_id
– if necessary – to the highest event id present in the database, thus not ignoring any newly added events.@jplitza Imagine the Event cleanup runs after an Event was created but before the receiving Agent was able to process it. The SQL query will update the Agents last_checked_event_id
and never process the Event that was created before the cleanup.
No, it won't. Example:
last_checked_event_id=123
last_checked_event_id=min(123,max(id))=min(123,170)=123
for our example agentOn the other hand:
last_checked_event_id=123
last_checked_event_id=min(123,max(id))<123
last_checked_event_id
.@jplitza I overlooked the min,max
logic. I think you are right and your solution should cover all situations properly.
There is the potential for a race condition when the event cleanup runs at the exact time when Agent#receive!
is somewhere here, but I don't think it would cause problems. The Events processed in receive!
will always be recent enough to not be deleted by the event cleanup.
@knu @cantino Could you double check? If it indeed fixes the issue the I feel it would be a clean solution than #1974
The problem is where to hook and execute the UPDATE statement. Should we do it every time a database connection is established, and is that enough?
A rather minor implementation issue I found out is that min(a, b)
is not a valid SQL expression. 😆
There's least(a, b)
that's available in both MySQL and PostgreSQL, but they differ in how to deal with null values, so you should use the case when
construct here instead, which is part of SQL standards.
@knu Should this issue be closed now that #2014 is merged?
Problem
On a very low-volume instance, at some point, events simply stop to be processed. They are generated, but not propagated along links, even if the links are re-created.
Cause
Links and Agents store which event ID they last processed. On a low-volume instance with events being deleted after some time, it may happen that the events table becomes empty. If the MySQL server is then restarted and the table uses InnoDB (the default?), the auto_increment column starts at 1 again. Thus, many events that are subsequently created are ignored by links and agents because they think they already processed them. Only if enough events are generated (before the table is empty and the server is restarted again), they will be noticed again.
Solutions