Closed BethanOFN closed 2 months ago
Hi @BethanOFN . I want to resolve this error,Can you provide more info about this?Like the response?
Hi @EdwardLi-coder - Thanks for your interest in this issue. I don't think the network response would be of much help here as it's a server side error. It would be great if we could reproduce this in local environment and then analyze the error on the server side. Please let me know if you have any question. Thanks.
Hi @EdwardLi-coder - Thanks for your interest in this issue. I don't think the network response would be of much help here as it's a server side error. It would be great if we could reproduce this in local environment and then analyze the error on the server side. Please let me know if you have any question. Thanks.
Due to the 500 error response, I believe this might contain information about it. I attempted to reproduce this error on my local computer but was unable to do so.
We track all application errors with Bugsnag, so I checked there and found that it only happened during one time:
So I think it's safe to say
So I think we can close this issue
Still seeing infinite loading and failure to return products, so I'm reopening - no longer a 500 error, now it's no connection to server:
Still seeing infinite loading and failure to return products, so I'm reopening - no longer a 500 error, now it's no connection to server:
Hi @BethanOFN - Just to be sure it's not browser related, can you please validate the same case in Chrome as well? Thanks.
@chahmedejaz sure 👍 on chrome:
Thanks for the confirmation, Bethan. Looking at the errors, seems like they are on the server level and not on the code level. Any of @openfoodfoundation/core-devs may look into it in this case. Thanks.
By the way, UK staging is also giving this error and it's not even opening the site.
I'll look into this and see what I can find.
By the way, UK staging is also giving this error and it's not even opening the site.
Yes, I am working on that and broke it.. 😞
I found a correlation between using this subscriptions search, and an increased load on the server, which led to the 504
timeout errors. Here are some examples:
ofn-admin@production18:/var/log/nginx$ grep -P -o '13/Aug/2024:11:.*search.json.*?" ' access.log
13/Aug/2024:11:17:28 +0000] "GET /admin/variants/search.json?q=kefir&distributor_id=160&eligible_for_subscriptions=true&include_out_of_stock=&_=1723547826470 HTTP/2.0"
...
13/Aug/2024:11:22:27 +0000] "GET /admin/variants/search.json?q=kefir&distributor_id=160&eligible_for_subscriptions=true&include_out_of_stock=&_=1723548065530 HTTP/2.0"
...
13/Aug/2024:11:29:02 +0000] "GET /admin/variants/search.json?q=goats+milk+kefir&distributor_id=160&eligible_for_subscriptions=true&include_out_of_stock=&_=1723548065535 HTTP/2.0"
... (total 20 over a period of 20 minutes)
ofn-admin@production18:/var/log/nginx$ grep '2024/08/13 11:' error.log
2024/08/13 11:22:27 [error] 1636#1636: *501596 upstream prematurely closed connection while reading response header from upstream, client: 92.14.44.0, server: openfoodnetwork.org.uk, request: "GET /admin/variants/search.json?q=kefir&distributor_id=160&eligible_for_subscriptions=true&include_out_of_stock=&_=1723548065530 HTTP/2.0"
...
2024/08/13 11:29:02 [error] 1634#1634: *501844 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 92.14.44.0, server: openfoodnetwork.org.uk, request: "GET /admin/variants/search.json?q=goats+milk+kefir&distributor_id=160&eligible_for_subscriptions=true&include_out_of_stock=&_=1723548065535 HTTP/2.0"
(ref)(time is reported in different timezone, just look at the minutes)
Given the sustained high CPU usage, I wondered if it was due to a report. But I could find no reports were generated at that time, and no long-running jobs in sidekiq.
I also noticed calls to /api/v0/order_cycles/*/products.json
at this time, but it seems normal. There were 313 requests made over that hour.
So I'm guessing that search.json
is a very intensive process. It's possible that it takes so long that people retry, which can exacerbate the problem.
I had a look at the code, it is surprisingly convoluted. I'm unclear on exactly how it works, but it's possible the query is not scoped to the distributor, and therefore it could be processing large numbers of variant records unnecessarily.
Ok, yep it's the search process. I can see it timing out after the allotted 2 minutes, each time it is called. For example: (thanks @rioug for adding the request ID to the logs, this was really helpful!)
ofn-admin@production18:/home/openfoodnetwork/apps/openfoodnetwork/current/log$ egrep "(Completed 504 Gateway Timeout)|(search.json)" production.log
I, [2024-08-13 11:17:20 #7294] INFO -- : [148e245e-bfaf-441f-8959-a1b088166ace] Started GET "/admin/variants/search.json?q=kefir&distributor_id=160&eligible_for_subscriptions=true&include_out_of_stock=&_=1723547826470" for 92.14.44.0 at 2024-08-13 11:17:20 +0000
...
I, [2024-08-13 11:19:20 #7294] INFO -- : [148e245e-bfaf-441f-8959-a1b088166ace] Completed 504 Gateway Timeout in 119995ms (Views: 0.6ms | ActiveRecord: 116259.6ms | Allocations: 10756109)
I think the first thing to do is to try optimising it.
FYI @BethanOFN I believe this is performing so poorly due to the immense number of Enterprise Permissions granted to the hub (Tamar Food Valley). I don't know if it's possible, but if there's any old ones that you can remove, it might help.
Unfortunately I'm having trouble making any meaningful optimisations at this stage.
I've tried this search.json
command on my computer, using a copy of the same database, and although it's not fast, it's performing relatively well. (compared to loading any page as a super admin which is excruciatingly slow... I think due to the large number of hubs and order cycles on the system)
So I'm actually not sure now why this would be causing the issue. 🤔
I think I will look again at what changes in Product Refactor could have caused it. Hmm or if it's worth increasing server resources.
FYI @BethanOFN I believe this is performing so poorly due to the immense number of Enterprise Permissions granted to the hub (Tamar Food Valley). I don't know if it's possible, but if there's any old ones that you can remove, it might help.
Thanks @dacook, yes they're the biggest hub on UK Prod! I'll pull up a query for any outdated permissions and see what I can do
includes
)I tried to replicate the issue on uk_prod (as superuser), but wasn't able to.
Increasing server CPU or memory would probably help a bit, but as I can't replicate on prod, I can't really say for sure.
@BethanOFN sorry for the delay on this one. I'm just wondering if this is still an issue for you?
@dacook sorry been on leave! I ran a query on enterprise permissions and sent it to the user with suggested deletions for inactive enterprises - deleting a lot of them resolved the issue for the user, so this can be closed I'd say given no one else has reported it and they are the biggest user on UK instance
Hope you had a good break! No worries, glad it's no longer an issue.
Description
User flagged that whilst trying to create a new subscription, the required products were not appearing in the dropdown when searched and the screen stalled on an infinite loader. As such, they could not create the subscription. When testing this I replicated it on my end both with a new subscription and trying to add new items to an existing subscription.
Expected Behavior
When a product name is searched for, all products containing that name should appear in the dropdown to allow selection for the subscription
Actual Behaviour
When creating a new subscription or editing an existing subscription, no products are returned when a search term is entered. This means that subscriptions cannot be used as the dropdown only displays 'searched for' products, not all products.
Steps to Reproduce
Animated Gif/Screenshot
See 500 error in GIF below when searching product:
Workaround
No workaround - I have tried as superadmin and also from my very small test shop (as identifying user is a large hub) and nothing has worked
Severity
bug-s2
Your Environment
Possible Fix