Closed bqbn closed 5 years ago
We did some more investigation, and tried to run the theme_update_counts_from_file.py
in the management shell.
It turns out it is https://github.com/mozilla/addons-server/blob/master/src/olympia/stats/management/commands/theme_update_counts_from_file.py#L109 that caused the OOM, apparently it tried to load too many stuff from the database into the memory.
And it also appears this job has been failing for a while (possibly since the LWT migration), yet no one seems to have noticed or complained about it.
I'm not exactly sure what the purpose of this command serves, and whether or not it is still needed today. If not, maybe we can consider stop running it, or if it does have a purpose, the issue should probably be fixed.
Thanks.
And it also appears this job has been failing for a while (possibly since the LWT migration), yet no one seems to have noticed or complained about it.
Since when has it been failing? Since the mentioned 2019-07-04 or even before that?
This command specifically is being used for our statistics dashboards, specifically for theme updates. So, as long as we do have that dashboard we'll need that data.
The mentioned line got added in https://github.com/mozilla/addons-server/commit/46e38faf3c970f50f4d8d55bc2d1f8a0f720240e#diff-fbf940e09a29be25c651f1a38db092deR103 and it feels to me it does load too much data from the database indeed.
What this is doing is, selecting all UpdateCount
instances for all add-on ids in migrated_personas
which I suppose is a lot. The problem here is though, UpdateCount.addon_id
is not unique on the model and there will be lots of UpdateCount
instances per add-on. Unfortunately the structure is stored in a dictionary and all but the last UpdateCount
instance will be overwritten / removed.
@eviljeff is this migration specific piece still needed? If so, we're going to have to optimize it a bit.
@EnTeQuAk we probably don't need it forever but for a period of time we should have it to record the usage from user of old versions of Firefox who didn't upgrade. If Product agreed suppose we could regard now as being a good enough period of time.
From what's said above it might be true that it's never worked properly which is disappointing because it was a challenge to develop and apparently I missed the day
on the filter π so challenge failed.
If Product still want to count old LWT users of the themes then it should be fixed (by adding day
at least - not sure if that's enough optimization); Otherwise, it can be removed.
Since when has it been failing? Since the mentioned 2019-07-04 or even before that?
My guess is it's been failing possibly since the LWT migration.
@jvillalobos to π and decide if it's worth fixing.
The impact is that LWT users aren't being counted in theme usage stats? If so, I think it's fine to drop.
The impact is that LWT users aren't being counted in theme usage stats?
yes.
Ok so if I understand correctly the action here is to remove this cron and the script.
Ok so if I understand correctly the action here is to remove this cron and the script.
yep. We're going to remove it rather than try to fix it.
@eviljeff does this require QA?
Describe the problem and steps to reproduce it:
There is a cron job that runs
./manage.py theme_update_counts_from_file
daily (i.e. https://github.com/mozilla/addons-server/blob/master/scripts/crontab/crontab.tpl#L32). This job wasn't able to finish for 2019-07-04 date and it got OOM killed in the middle of running.The kernel message showed the process used more than 6 GB memory before it's killed.
What happened?
We were alerted for missing stats for 2019-07-04, so we tried to manually re-run all the stats related cron jobs for the date, and during the process, we found that the
theme_update_counts_from_file
job wasn't able to finish and got OOM killed.What did you expect to happen?
The job to finish without being OOM killed.
Anything else we should know?
N/A