Open nicolasVr opened 2 months ago
@openfoodfoundation/developers without fixing it first, is there a way to easily find the process that takes so much time? If it's relationships FR support could consider to divide their shops between several email accounts.
I made various tests to check what makes the shop slow :
I have also added the same type of order cycle on the new shop with no effect (normal speed). Last tests were made on staging
This issue was first reported on Slack on 2024-09-11.
There was a series of optimsations for this page before this performance issue was reported. One optimsation just the day before.
Rendering of the shops partial seems to take a long time but sometimes it's really fast:
[2024-08-25]: Rendered shops/index.html.haml (Duration: 83606.7ms | Allocations: 2679115)
[2024-08-25]: Rendered shops/index.html.haml (Duration: 119988.4ms | Allocations: 7202747)
[2024-08-25]: Rendered shops/index.html.haml (Duration: 31.2ms | Allocations: 5526)
But trawling through the logs like this it's not easy to see actual trends. So let's ask New Relic. We only have data for au-prod there but maybe we can see trends:
The biggest change is actually on July 9, when a big product refactor was deployed:
This performance problem may be different to the described issue here though. There seems to be a particular bad circumstance with French data.
I copied the French database to my local server and tested the shops page. It loads in about 10 seconds. Reverting #12827 doesn't help, then it takes 12 seconds.
Going back to the code before all those optimisations, it takes 11 seconds to load. So no significant difference here.
FYI the load time is specific to Micromarche shop. I made many tests and I think it is linked to the orders background. But not only the number of orders because some shop have more orders and are much faster. Thanks for caring anyway
Oh, okay, I looked at the wrong thing then. I looked at shops
page because that's what you first reported on Slack. Now I see that this is for one specific shop.
So the individual shop page loads in 13 seconds for me but the shop is closed at the moment.
Sorry for the misleading slack posts. I coud see no loading difference between order cycle open or closed. During "french openiing hours" on prod loading time is more 30-40 seconds. On staging I have created a hub "micromarche test" that is a copy of micromrché - Agronaute without of course the background orders. Loading time is about 5 seconds.
On that page, we have a similar trend though:
Since the product refactor, the execution time multiplied. The thing that increased is:
Postgres Spree::Property find
A bit more specific:
app/serializers/api/cached_enterprise_serializer.rb:84:in `distributed_properties'
SELECT DISTINCT spree_properties.*
FROM "spree_properties" INNER JOIN "producer_properties" ON "producer_properties"."property_id" = "spree_properties"."id" INNER JOIN "enterprises" ON "enterprises"."id" = "producer_properties"."producer_id" INNER JOIN "spree_variants" ON "spree_variants"."deleted_at" IS ? AND "spree_variants"."supplier_id" = "enterprises"."id" INNER JOIN "spree_products" ON "spree_products"."id" = "spree_variants"."product_id" INNER JOIN "spree_variants" "variants_spree_products" ON "variants_spree_products"."deleted_at" IS ? AND "variants_spree_products"."product_id" = "spree_products"."id" INNER JOIN "exchange_variants" ON "exchange_variants"."variant_id" = "variants_spree_products"."id" INNER JOIN "exchanges" ON "exchanges"."id" = "exchange_variants"."exchange_id" INNER JOIN "order_cycles" ON "order_cycles"."id" = "exchanges"."order_cycle_id"
WHERE "exchanges"."incoming" = $? AND "exchanges"."receiver_id" = $? AND (order_cycles.orders_open_at <= ? AND order_cycles.orders_close_at >= ?)
The code line referenced by New Relic is ambiguous. Looking at the SQL, it seems to be the distributed_producer_properties
method. But that whole file hasn't been changed in the product refactor.
Loading that particular shop locally with French data reveals the same thing. The majority of time is spent in the distributed_properties
.
I think I have to take a step back here. I was hoping to find a trivial fix like reverting a pull request. But this database query needs some really good focus. Maybe caching will be an answer.
The sad thing is that there are only 18 properties. So it takes all this time just to combine a lot of data in the database to know which of these 18 properties to display so that we can filter in the shopfront and don't show filters for properties that don't have products here.
Does it suggest a workaround we could use in the meantime? We are about to help the hub to build new shops without order history, which should solve the problem for some time but not without harm and inconvenience.
Good question. The query doesn't involve orders. But it does involve order cycles. They are then filtered by active order cycles but you could try to clean out old order cycles.
Just to explain a bit, to find the product properties to show it looks at all producers with properties, then all the products they have and all the order cycles these products are in. Then it filters the order cycles by the distributor and opening times.
Dev note: order_cycles
doesn't have any indexes. At least the distributor id would be helpful to index here.
Dev note:
order_cycles
doesn't have any indexes.
😱 That doesn't sound good! I just did a search and found 18 tables don't have indexes. Some would have tiny datasets and should be ok, but it seems like we should be indexing order_cycles.
~At least the distributor id would be helpful to index here.~
I'm curious why you crossed this out, I'll assume that thought isn't complete and won't try to pick it up..
It looks like we have regular issues with server load on FR prod, with downtimes being reported regularly. Last night, it was down for 8 minutes then 11 minutes. https://openfoodnetwork.slack.com/archives/GLJQ2LASF/p1727031576269109 Similar last weekend.
At least the distributor id would be helpful to index here.
I'm curious why you crossed this out, I'll assume that thought isn't complete and won't try to pick it up..
Well, order_cycles
doesn't have a distributor id. It has the coordinator id but the distributors are in outgoing exchanges and they are indexed. The only fields on order_cycles
worth indexing are the opening and closing time. But datetime doesn't always perform well with indexes, we need to proof the benefit. If the number of order cycles is the problem then maybe we need to archive old order cycles with a flag that's easily indexed.
@openfoodfoundation/developers There is a greenlight from budget circle on this first step. I've created #12870 Index Order cycles
in Global clockify to track the time.
@nicolasVr I've missed a few meetings, but I don't see in the notes that a budget was dedicated to that, so I'm downgrading it to s3 // the global team won't be able to take this one before global core budget increases :(
Please use #12870 Index Order cycles in Clockify when working on this issue
Description
Micromarché manages several Hubs (for instance https://coopcircuits.fr/micromarche-la-resto-du-grand-t/shop#/shop_panel ). Those hubs are linked to many producers and have a big order background. Opening the shop page takes 30 to 40 seconds and, logically, buyers complain that is long and some give up.
Expected Behavior
Shop page should open in a few seconds.
Actual Behaviour
Opening a page takes 30 to 40 seconds
Steps to Reproduce
Workaround
No kwown workaround
Severity
s2 ?