medusajs / medusa

The world's most flexible commerce platform.
https://medusajs.com
MIT License
25.93k stars 2.6k forks source link

TypeORM produces bad query for /products route #4200

Closed levshkatov closed 4 months ago

levshkatov commented 1 year ago

Bug report

Describe the bug

The routes GET /products stop working properly when dealing with a large number of products (~100.000-200.000). The ProductService.listAndCount method has a temporary fix for typeorm's query strategy mode, but it generates a peculiar SQL query that completely hangs the database due to a problematic subquery.

System information

Medusa version (including plugins): 1.11.0 Node.js version: 18.16.0 Database: PostgreSQL Operating system: Ubuntu 22.04 Browser (if relevant): Google Chrome

Steps to reproduce the behavior

  1. Add plenty of products (~100-200 thousands)
  2. GET /admin/products?offset=0&limit=1

Expected behavior

Modify productRepo.findWithRelationsAndCount in ProductService.listAndCount to prevent it from generating incorrect queries.

Code snippets

Example of query:

select
    distinct "distinctAlias"."product_id" as "ids_product_id",
    "distinctAlias"."product_created_at"
from
    (
    select
        "product"."id" as "product_id",
        "product"."created_at" as "product_created_at"
    from
        "product" "product"
    left join "product_category_product" "product_categories" on
        "product_categories"."product_id" = "product"."id"
    left join "product_category" "categories" on
        "categories"."id" = "product_categories"."product_category_id"
    where
        "product"."deleted_at" is null
        ) "distinctAlias"
order by
    "distinctAlias"."product_created_at" desc,
    "product_id" asc
limit 1;
adrien2p commented 1 year ago

Unfortunately, this query is generated by typeorm itself in order to perform pagination and ordering when having joins. It is using getManyAndCount from typeorm. We know that typeorm has many issues that we are battling with πŸ˜…. There is an open discussion if i am not mistaken where we are gradually trying to move away from typeorm but it will take time..

Also, on the other hand, in general this type of query is what is needed for pagination and ordering as it is also used under the hood to grab the ids to query within those and paginate in separate queries, and should work with multi millions tuple tables as well πŸ€”

adrien2p commented 1 year ago

What is the available memory for your postgres instance, if you run the query manually what do you obtain? Also, is there any logs or error that you could share? Maybe from postgres itself.

Also, the db characteristics needs to be adapted to your case and the quantity of data you might have

levshkatov commented 1 year ago

Actual free RAM is ~3GB. SQL explain analyze produces such results (I just removed join categories from query):

  1. Number of products: 100

    Limit  (cost=65.00..65.00 rows=1 width=40) (actual time=0.252..0.253 rows=1 loops=1)
    ->  Sort  (cost=65.00..65.25 rows=100 width=40) (actual time=0.252..0.252 rows=1 loops=1)
        Sort Key: product.created_at DESC, product.id
        Sort Method: top-N heapsort  Memory: 25kB
        ->  HashAggregate  (cost=63.50..64.50 rows=100 width=40) (actual time=0.135..0.156 rows=100 loops=1)
              Group Key: product.created_at, product.id
              Batches: 1  Memory Usage: 32kB
              ->  Seq Scan on product  (cost=0.00..63.00 rows=100 width=40) (actual time=0.008..0.074 rows=100 loops=1)
                    Filter: (deleted_at IS NULL)
    Planning Time: 0.104 ms
    Execution Time: 0.298 ms
  2. Number of products: 267.000

    Limit  (cost=88181.39..88181.51 rows=1 width=40) (actual time=30079.107..30082.729 rows=1 loops=1)
    ->  Unique  (cost=88181.39..120886.11 rows=269249 width=40) (actual time=30079.105..30082.725 rows=1 loops=1)
        ->  Gather Merge  (cost=88181.39..119539.87 rows=269249 width=40) (actual time=30079.103..30082.723 rows=1 loops=1)
              Workers Planned: 2
              Workers Launched: 2
              ->  Sort  (cost=87181.36..87461.83 rows=112187 width=40) (actual time=30065.704..30065.907 rows=473 loops=3)
                    Sort Key: product.created_at DESC, product.id
                    Sort Method: external merge  Disk: 4328kB
                    Worker 0:  Sort Method: external merge  Disk: 4680kB
                    Worker 1:  Sort Method: external merge  Disk: 4096kB
                    ->  Parallel Seq Scan on product  (cost=0.00..74701.87 rows=112187 width=40) (actual time=31.034..29932.381 rows=89030 loops=3)
                          Filter: (deleted_at IS NULL)
    Planning Time: 1.445 ms
    Execution Time: 30083.805 ms

I also added some custom fields to product, product_category tables but I don't think the problem is there. I think the problem is in subquery that tries to select whole table

levshkatov commented 1 year ago

After several attempts, I managed to make it work. Even the entire route /admin/products is functioning. It is quite slow, but it works. Therefore, if we cannot influence TypeORM's querying directly, perhaps adding indexes could help improve the performance.

adrien2p commented 1 year ago

Exactly, what indexes you needed to add? Also, we are trying to add logical index, but sometimes we might forget some of them so please do not hesitate to share. Also, the route include the prices calculation using the strategy and depending it can add the inventory management, it has been improved but depending on your needs and where you are fetching the product for, you can select only the relations that you need and delay the prices and inventory to a later point. But we are entering the realm of customisation to a case by case basis ☺️

adrien2p commented 1 year ago

Also, how many products do you fetch per page? How many variants, prices, options do you have as they are joined as part of the query as well. Also, more products per page they are and more dynamic prices calculation and inventory they are to managed. Just as an information

levshkatov commented 1 year ago

I don't know about indexes now, it was just a suggestion. This route includes many queries, and they work perfectly except first distinct one. However, it also works after multiple attempts, so I suppose postgres optimize something. And the main problem is that I don't know how did it work and how can I help database in future, hah.

levshkatov commented 1 year ago

Firstly, I tried the simple route from the admin dashboard, limiting it to 15 with default fields and expanding. It broke after a 1-minute timeout, and I realized that the problem lies in the first query of this transaction, regardless of whether I query 15 products or just one. product_variant: ~ 2mln product_category: ~ 1000 product_option: ~ 11mln product_option_value: 11mln

adrien2p commented 1 year ago

When you add a lot of data in your db, the indexes might not have been updated properly yet and you might need to ask postgres to update the indexes data to take into account your newly added data especially when you add a lot of it (REINDEX maybe). It might be why at first it didn't worked and then after some tests it ends up working as by selecting the data the indexes are updated

adrien2p commented 1 year ago

Also, the prices calculation will gets improved as part of 1.12 which should improve performances as well

levshkatov commented 1 year ago

I just logged the database queries that produce an admin request for the products list (/admin/products?offset=0&limit=15...etc). I realized that it generates approximately 120 queries to the database and around 90 queries to the money_amount. Is it ok? :D https://pastebin.com/idy2qyqE

I updated to v1.12, but unfortunately it didn't help.

adrien2p commented 1 year ago

That seems a bit too much, you should have something like this. Maybe some more if you have more currencies but I wouldn't expect that high. Most of them being parallelised when possible and some the prices are cached if already computed.

Also, I ll push something to remove unnecessary count

Screenshot 2023-05-30 at 16 04 13
adrien2p commented 1 year ago

Also, it looks like you have one query per variant for the money amount whereas you should have one query per product for all its variants. I haven't been able to reduce to one query only due to the way it is architectured and the trade-off between how the users would consume our method and performances. In any case, if you have 15 products you should have 15 queries for the money amount and not 15 * variants count queries

levshkatov commented 1 year ago

Thanks for reply! It's also strange for me even because I have only one currency and one money_amount per variant. So, I expect simple price calculation but not such a huge amount of queries. Perhaps, I will try the same on fresh medusa install later and reply here.

levshkatov commented 1 year ago

Could you tell, how many products and variants do you have for this screenshot? For 1000 products and 10000 variants I also have excellent performance result

adrien2p commented 1 year ago

Thanks for reply! It's also strange for me even because I have only one currency and one money_amount per variant. So, I expect simple price calculation but not such a huge amount of queries. Perhaps, I will try the same on fresh medusa install later and reply here.

If you are fetching 15 products, I would expect 15 queries for the money amount in the context of my pagination, but not one per variants but one per product. I am more talking about the number of queries you have and not about the perf :)

In the following screenshot I test with 300+ products and 15000+ variants (50 variants per product to try hard), you can see that I have 12 products and 12 queries for the money amount. The count queries are 12 as well but with my last pr they will disappear.

Screenshot 2023-05-30 at 17 52 13
levshkatov commented 1 year ago

What route are you fetching: admin or store? And what query parameters, apart from offset and limit, are you using? I will try the same tomorrow on the database with mocked data and on the real one. Also, could you please tell me what is this cozy instrument on your screenshots? :smiley:

adrien2p commented 1 year ago

I am testing on GET /admin/products?limit=12 as it seems to be what you was doing, the result is a list of 12 products including 600 variants with their prices etc. For the tool I am using https://www.npmjs.com/package/medusa-plugin-sentry which I have developed sometimes ago to help me monitor and investigate 😊

levshkatov commented 1 year ago

I optimized the structure of my data. Now I have 270.000 products, 1 mln variants, 1 mln options, and 1 mln money_amounts. I tried to fetch products from the admin dashboard after every ~40.000 products and noticed that: before 50000: everything is ok 50-200: queries become much slower 200+: queries drops by timeout

We will add custom endpoints with custom logic for the store. But for the dashboard, it isn't easy. Yeah, we can expose admin code, change it, build again, etc, but that's too difficult. However, thanks a lot for helping!

adrien2p commented 1 year ago

but I don't understand why you have that much queries running, Do you have a chart where we can see the queries fired like I ve shown you with sentry? unless you are fetching 200 products per page :)

alessioacella commented 9 months ago

Same problem here!

We have on going pricelists and Sentry is logging "N+1 Query" on Money amounts - Store products list (24 products per page).

Medusa 1.18

sentry-1 sentry-2
sradevski commented 4 months ago

Hey, thanks for the report! Since v2 brought a lot of architectural and API changes on the backend, we will be closing this ticket since it no longer applies to our new setup, or the issue has already been fixed. We have already moved away from TypeORM, so hopefully this issue should be resolved in v2. If you are still facing issues with v1, please open a new ticket and we will address it as soon as possible. Thanks! πŸ™