openfoodfoundation / openfoodnetwork

Connect suppliers, distributors and consumers to trade local produce.
https://www.openfoodnetwork.org
GNU Affero General Public License v3.0
1.11k stars 718 forks source link

Reports delayed on French instance #12604

Closed nicolasVr closed 3 months ago

nicolasVr commented 3 months ago

Description

At least 2 hub managers complained that last Friday (June 21st) they were unable to print reports.

User ID : 24680

and 2657

They have finally received emails (many ot them of course) with the reports at 9 PM.

One of the reports delayed was Total By Producer.

Reports generation is ok at the moment, but we need to make sure that the issue won't happen again, and we owe some explanations to our customers. It impacted they organization heavily as they needed their reports to prepare their Saturday morning delivery.

Related note?

One of these users is also affected by this bug is back. Might be related?

Your Environment

French instance coopcircuits.fr

RachL commented 3 months ago

@openfoodfoundation/core-devs do we have any logs on what may have happened last Friday?

abdellani commented 3 months ago

Hi I'm not able to collect enough details from NewRelic. I'll un-assign myself.

rioug commented 3 months ago

I had a look at the logs, and somebody started this beautiful report, that took about 12 hours to complete :boom: !

2024-06-21T11:49:29.618Z pid=834633 tid=iwct class=ReportJob jid=1d09ef9b739e91b2155ac704 INFO: Performing ReportJob (Job ID: 69bbd0aa-ded5-4533-949b-d5144876cffd) from Sidekiq(default) enqueued at 2024-06-21T11:49:29Z with arguments: {:report_class=>Reporting::Reports::OrdersAndDistributors::Base, :user=>#<GlobalID:0x00007f7d540af8d8 @uri=#<URI::GID gid://openfoodnetwork/Spree::User/52690>>, :params=>{"authenticity_token"=>"2IksNYR_UAkNqqkQEfnPl0iIiEFMcIRRXlwuDg-ulC5WuPip3UBbQqDTc-GHdk58XddwDDYwZBjbUX0tCvX-VQ", "uuid"=>"1ba51148-9b6d-459c-9717-05219577b864", "q"=>{"completed_at_gt"=>"", "completed_at_lt"=>"", "distributor_id_in"=>[""]}, "report_format"=>"", "fields_to_show"=>["order_date", "order_id", "customer_name", "customer_email", "customer_phone", "customer_city", "sku", "item_name", "variant", "quantity", "max_quantity", "cost", "shipping_cost", "payment_method"], "controller"=>"admin/reports", "action"=>"show", "report_type"=>"orders_and_distributors"}, :format=>"html", :filename=>"orders_and_distributors_20240621.html", :channel=>"ScopedChannel:1ba51148-9b6d-459c-9717-05219577b864"}
2024-06-22T00:00:02+0000: Performed ReportJob (Job ID: 69bbd0aa-ded5-4533-949b-d5144876cffd) from Sidekiq(default) in 43833243.15ms

This was most likely slowing down the server, meaning reports where a bit slower to be generated, this was also compounded by the fact user kept trying to generate the same report adding load to the server and further slowing down everything.

It's fair to assume they were not the only ones which exacerbated the problem.

NewRelic monitoring confirm my observation, there was an increase in CPU usage and server load from about 11:30 on the 21st of june. I can also see Sidekiq process maxing out CPU all afternoon.

Recommendation:

rioug commented 3 months ago

@RachL Let me know if you want to proceed with any of the recommendation.

FYI This issue https://github.com/openfoodfoundation/openfoodnetwork/pull/12592 may help with this kind of problem.

rioug commented 3 months ago

We discussed this with Maikel and David, and we agreed that the report button should be disable when the report is loading, which I believe is not the case currently. It should at least limit spamming of report. I'll have a look at this tomorrow.

RachL commented 3 months ago

I'm closing this one for now. let's see if this still occurs once https://github.com/openfoodfoundation/openfoodnetwork/pull/12619 is released, we can reopen :+1: