Closed lgarron closed 2 years ago
I believe this is the relevant code:
I would assume that clearing the class variable whenever a team member change is made should be sufficient; however we have two worker running in separate processes in production so we would only have one update.
Another approach would be to take into account some updated_at
field for the team when building the @@teams_by_friendly_id
variable, and update that field when a team member is added/removed/changed.
Completing this issue: this actually raises issues when syncing mailing lists, and may not change the members appropriately until the cache is cleared (ie: the unicorn restart we do when we deploy).
Checking the updated_at
field kind of ruins the point of caching as we'd have to emit a select.
I don't have a great suggestion for this; another alternative would be to send a restart signal to unicorn before going into the daily jobs.
Turns out, the process running the Delayed::Job
is different from the unicorn one, so we'd probably need to do both restart_app
and restart_dj
.
Another alternative: pass some force_reload
optional argument when fetching a team, that we can pass when calling Team.board
in the rake task.
This error occured again recently and is critical to some operational WCA procedures (Delegates falsely detected as being on probation, confidential mails still being forwarded to people who have been removed from the team). Unfortunately, I don't have sufficient insight into our Rails caching to be able to figure this out.
Summoning our resident sage @jonatanklosko for your Rails and infrastructure expertise 🙏
As the teams page on the WCA page is public, I think it's fine to share the following bit of diagnosis:
2015MATT05
is no longer a WQAC member. SSHing into the production machine and querying WQAC team members through the bin/rails c
console prompt indeed shows that Enzo is not listed as member. But the background jobs keep adding him to the Google Workspace mailing list, even when we manually remove him through the web admin interface. So clearly, he (i.e. the Team.wqac
team with all its members) is cached somewhere. But I don't know how to proceed from here 😕
To summarize the problem, we query teams once and store them in the @@teams_by_friendly_id
class variable (that's the cache in question). The variable is never invalidated, until we restart the given app process.
Because there are multiple separate processes I don't think there's a straightforward way of invalidating the cache in all of them on demand (when a member is added/removed).
One option I see is invalidating this cache periodically, so in
we could do something like this
@@teams_by_friendly_id ||= nil
if @@teams_by_friendly_id.nil? || @@teams_by_friendly_id_timestamp < 15.minutes.ago
@@teams_by_friendly_id = all.with_hidden.index_by(&:friendly_id)
@@teams_by_friendly_id_timestamp = DateTime.now
end
@@teams_by_friendly_id
This way we never use cached teams older than 15 minutes. Given we sync mailing list every hour, with this approach the worst-case propagation time would be 1 hour and 15 minutes.
@viroulep let me know if you have any thoughts on this :)
Just for the record: I don't have more to add to Jonatan's analysis; I did remember this issue happened once in the past, but I guess we let it go, maybe because at the time we had frequent enough deploys (or less team changes) so that issue wasn't very relevant.
I definitely support Jonatan's workaround! Unfortunately there is no way to properly invalidate the cache on team change because we have multiple instances running in different processus :s
While debugging tests for #4534 we found that
Team.wrc
was cached across tests. In theory, this could have repercussions in production (although the impact is limited in practice, because team members don't change often).