openfoodfoundation / openfoodnetwork

Connect suppliers, distributors and consumers to trade local produce.
https://www.openfoodnetwork.org
GNU Affero General Public License v3.0
1.12k stars 723 forks source link

Enable background reports on UK #10757

Closed dacook closed 1 year ago

dacook commented 1 year ago

The UK server has had downtime, which seems to be a result of memory fully allocated. We were working around this by restarting Puma daily, although for the last week this didn't seem to help.

So let's hurry up and make use of the new background reports feature. It's still in progress, but will be an improvement. https://openfoodnetwork.slack.com/archives/C01T75H6G0Z/p1681840327630489

  • It should significantly reduce memory issues
  • The UX is slightly better than before (a message with a link instead of a 500 snail)

Prep

  1. Run a few reports to test baseline, and document results
  2. Check some stats for baseline

Deployment plan

  1. Toggle feature fully on (https://openfoodnetwork.org.uk/admin/feature-toggle/features/background_reports)
  2. Restart Puma to release memory (because it will no longer be used by Puma, Sidekiq is going to need it)
  3. Validation: check the same reports again
  4. Check stats again

Rollback plan

  1. Toggle feature fully off (https://openfoodnetwork.org.uk/admin/feature-toggle/features/background_reports)
  2. Restart Sidekiq to release memory
  3. Validation: check the same reports again
dacook commented 1 year ago

Before

Report test

Order Cycle Supplier Totals

Memory usage

Before running reports:

ofn-admin@production18:~$ ps -eo size,pid,user,command | egrep 'puma|sidekiq'
76700 11211 openfoo+ puma 6.2.2 (unix:///home/openfoodnetwork/apps/openfoodnetwork/shared/sock/puma.openfoodnetwork.sock) [2023-04-25-131137]
1914572 22139 openfoo+ sidekiq 7.0.9 2023-04-25-131137 [0 of 5 busy]
1498212 25734 openfoo+ puma: cluster worker 0: 11211 [2023-04-25-131137]
1703028 25793 openfoo+ puma: cluster worker 1: 11211 [2023-04-25-131137]

https://app.datadoghq.com/dashboard/bdw-2na-83i/openfoodnetworkorguk-cloned Screen Shot 2023-04-27 at 12 41 47 pm After report terminated: Screen Shot 2023-04-27 at 1 55 32 pm

dacook commented 1 year ago

After

Report test

Order Cycle Supplier Totals

Memory usage

After running report for 1 year (took extra memory but didn't reach system limit). We can see that Sidekiq has allocated the most memory, because it is now running the report.

ofn-admin@production18:~$ ps -eo size,pid,user,command | egrep 'puma|sidekiq'
76700 11211 openfoo+ puma 6.2.2 (unix:///home/openfoodnetwork/apps/openfoodnetwork/shared/sock/puma.openfoodnetwork.sock) [2023-04-25-131137]
1635444 13831 openfoo+ puma: cluster worker 0: 11211 [2023-04-25-131137]
1913708 13881 openfoo+ puma: cluster worker 1: 11211 [2023-04-25-131137]
4164016 16145 openfoo+ sidekiq 7.0.9 2023-04-25-131137 [0 of 5 busy]

Conclusion

Not good. There's an error in displaying all onscreen results, so we need to abort.

Also:

dacook commented 1 year ago

Rollback: I've disabled the feature toggle and restarted both sidekiq and puma. Memory usage is back to a normal amount. Tested to confirm:

I can see that the last two nightly puma restarts successfully reset the memory, so I think no further action required in the short term.