payloadcms / payload

Payload is the open-source, fullstack Next.js framework, giving you instant backend superpowers. Get a full TypeScript backend and admin panel instantly. Use Payload as a headless CMS or for building powerful applications.
https://payloadcms.com
MIT License
23.37k stars 1.49k forks source link

Slow-query of payload 1.6 takes me about 10s to load an article list of nearly 30k documents #2227

Closed Stupidism closed 1 year ago

Stupidism commented 1 year ago
    It takes me about 10s to load the article list 

image

I inspected the slow API. If I remove the draft option. I will be very fast.

https://mysite.com/api/articles?locale=en&depth=0&draft=true&limit=10

vs

https://mysite.com/api/articles?locale=en&depth=0&limit=10

And it cannot reproduce in my local machine.

And I ran the migration script as the document said. https://github.com/payloadcms/payload/blob/master/CHANGELOG.md#-versions-may-need-to-be-migrated

Originally posted by @Stupidism in https://github.com/payloadcms/payload/discussions/2217

jmikrut commented 1 year ago

Hey there – that is certainly not acceptable. Do you have indexes created on the ‘updatedAt’ and ‘parent’ fields on your versions collection?

We will look into this tomorrow 👍

Stupidism commented 1 year ago

In my local env. Yes

image

Let me find someone to help me figure out with stage and prod env. @gedaaaa

jmikrut commented 1 year ago

Hey @Stupidism — I just ran a test with 20K documents and I am able to load the query with ?draft=true in 130ms.

I do indeed have indexes created as you do locally. I wonder if this is indeed an indexing issue and that you may have no indexes present in your production _versions collection.

As another followup, how many documents are in. your _articles_versions collection in production?

Stupidism commented 1 year ago

@jmikrut Here're the indexes in our stage environment.

"{""_id"": 1}",_id_,copy-from-prod-2023-02-07._articles_versions,4
"{""parent"": 1}",parent_1,copy-from-prod-2023-02-07._articles_versions,4
"{""version.slug"": 1}",version.slug_1,copy-from-prod-2023-02-07._articles_versions,4
"{""createdAt"": 1}",createdAt_1,copy-from-prod-2023-02-07._articles_versions,4
"{""updatedAt"": 1}",updatedAt_1,copy-from-prod-2023-02-07._articles_versions,4
"{""version._status"": 1}",version._status_1,copy-from-prod-2023-02-07._articles_versions,4
"{""autosave"": 1}",autosave_1,copy-from-prod-2023-02-07._articles_versions,4
Stupidism commented 1 year ago

The totalDocs of article versions is 6k+. Is it possible that this is due to lack of versions for each article?

image
Stupidism commented 1 year ago

What's interesting is it only affects the REST api. The graphql API is working as expected.

image
Stupidism commented 1 year ago

We ran some profile and got this result. It's a COLLSCAN image

Here's the excuted query:

{
  "aggregate": "_articles_versions",
  "pipeline": [
    { "$sort": { "updatedAt": -1 } },
    {
      "$group": {
        "_id": "$parent",
        "version": { "$first": "$version" },
        "updatedAt": { "$first": "$updatedAt" },
        "createdAt": { "$first": "$createdAt" }
      }
    },
    { "$match": {} },
    { "$sort": { "createdAt": -1 } },
    { "$skip": 60 },
    { "$limit": 10 }
  ],
  "cursor": {},
  "lsid": { "id": { "$binary": "jWHgUWPbRiyh4yaZVJNYDw==", "$type": "4" } },
  "$db": "copy-from-prod-2023-02-07"
}

We tried to create a compound index for parent, updatedAt and createdAt. Still got no luck,

jmikrut commented 1 year ago

Hmm, this is very strange.

I have absolutely no idea why your GraphQL is functioning as expected but REST is not. Are you able to use a tool like MongoDB Compass to connect to your database? If so, you should try and recreate the aggregate that you are running within MongoDB Compass directly.

This also might be to do with AWS' DocumentDB. I believe that Mongo will fall back on writing to disk if it does not have an index available, or if the resulting data set is too large for it to be executed in memory. That could be why you are seeing COLLSCAN. This is definitely related to COLLSCAN—the question is - why?

I just seeded 40K documents and it still loads blazingly fast for me. Here is a screenshot of my working database's indexes within MongoDB Compass:

Screen Shot 2023-03-06 at 11 34 08 AM

Can you drop that version collection's indexes and recreate them?

Stupidism commented 1 year ago

Yes, we're using AWS' DocumentDB. Here're the indices shown with mongodb compass

image

BTW, the versions page flash to blank screen once I enter it.

image
jmikrut commented 1 year ago

Oops - did not mean to close. I didn't even know I did!

Question - is there a way that you could clone your exact database, run on Atlas (or a local MongoDB Community), and then re-assess the speed here?

JarrodMFlesch commented 1 year ago

@Stupidism The versions screen crashing was a reported issue and has since been fixed, just letting you know!

Stupidism commented 1 year ago

Oops - did not mean to close. I didn't even know I did!

Question - is there a way that you could clone your exact database, run on Atlas (or a local MongoDB Community), and then re-assess the speed here?

OK, I will try today.

Stupidism commented 1 year ago

What's interesting is it only affects the REST api. The graphql API is working as expected. image

Wrong info before. This was a screenshot from local env.

Stupidism commented 1 year ago

@JarrodMFlesch @jmikrut We found the root cause of the slowness here.

What's making the query super slow is the second sort aggregation. It looks like documentdb cannot utilize index for the second sort.

It might be related to this question: https://stackoverflow.com/questions/13466079/aggregate-framework-cant-use-indexes/56427875#56427875

image image

https://github.com/payloadcms/payload/commit/34582da561945cacc53d0c0664b54e63ebc8da25

Stupidism commented 1 year ago

Is it possible to use some other filters to get the latest version of an article instead of using group?

For example, if we set a flag in versions table latest: true?

Stupidism commented 1 year ago

Oops - did not mean to close. I didn't even know I did!

Question - is there a way that you could clone your exact database, run on Atlas (or a local MongoDB Community), and then re-assess the speed here?

We tried in local mongodb. We have about 30k articles and 60k versions. And the api is still slow in local, but much faster than stage env.

image image

Stupidism commented 1 year ago

Another thing we want to ask is how to log the db queies for each api call (graphql query, api params, response data, response time and etc). We found it really hard to locate the query when we meet such a slow-query situation

Stupidism commented 1 year ago

@jmikrut Any thoughts about optimizing the query itself?

jmikrut commented 1 year ago

Hey @Stupidism — I think this is definitely the move.

For example, if we set a flag in versions table latest: true?

The team and I have been discussing a way to do exactly this, and I think this will circumvent your issue.

Give us a bit here and we will get a plan in place to make this or something similar to this happen.

Stupidism commented 1 year ago

Great! Waiting for your good news.

jmikrut commented 1 year ago

Hey @Stupidism — what version of local Mongo are you using? Do you have it installed locally on your computer, or are you using Docker or similar?

It's very strange that we can't reproduce this at all. We're looking into fixes still, but I really need to be able to reproduce your exact scenario before being confident in a fix. Still digging, but the more info you can provide about your local MongoDB, the better.

We've also recreated your exact aggregate pipeline directly in Mongo Compass, and it's still quite fast:

image

Notice that within Compass, we can click on the Explain button which shows that our updatedAt index is being used. Is yours being used?

Can you provide us with an export of your local _articles_versions collection - the one that takes 5 seconds to run - so that we can test locally?

Here's what we've done today:

All of this still runs within ~800ms. Removing limit, skip, and match will reduce the query down to ~200ms but we're still keeping the second $sort and seeing a fast query.

We will get to the bottom of this but at this point I'd greatly appreciate a copy of your collection that is running slowly on your local computer. If you are willing to share an export of your collection, you can DM me directly on Discord - otherwise you can reach me via submitting this form on our site.

Thanks for hanging in there!

Stupidism commented 1 year ago

@jmikrut I think the biggest issue we're using AWS DocumentDB for production. We're trying to switch to normal mongodb on our end.

Yes the index is used for local:

image

But the explain command throws error in stage.

jmikrut commented 1 year ago

Hmm, ok - so just so I am clear, on local, you are actually seeing pretty good execution time - 785ms, right? Did you change anything? Before I believe you mentioned 5s+ even trying this locally. But this looks about similar to what I am seeing on my end when I test locally. Good that it's using the updatedAt index.

Let me know and we can go from there. Thanks!!

Stupidism commented 1 year ago

@jmikrut Bad news is that's still about 5s. API response time is much larger than the db execution time.

image image

I cannot analyze even on my local env easily unless dig into payload source code inside node_modules. This leads back to my question above.

Another thing we want to ask is how to log the db queies for each api call (graphql query, api params, response data, response time and etc). We found it really hard to locate the query when we meet such a slow-query situation

Stupidism commented 1 year ago

Here's what I can do to analyze it. The result is still about 5s. I cannot see the final query plan.

image
queryDrafts: 4.643s
aggregate:  {
  "_pipeline": [
    {
      "$sort": {
        "updatedAt": -1
      }
    },
    {
      "$group": {
        "_id": "$parent",
        "version": {
          "$first": "$version"
        },
        "updatedAt": {
          "$first": "$updatedAt"
        },
        "createdAt": {
          "$first": "$createdAt"
        }
      }
    },
    {
      "$match": {}
    }
  ],
  "options": {
    "allowDiskUse": true
  }
}
aggregatePaginateOptions:  {
  page: 1,
  sort: { createdAt: -1 },
  limit: 10,
  lean: true,
  leanWithId: true,
  useEstimatedCount: false,
  pagination: true,
  useCustomCountFn: undefined,
  options: { limit: 10 },
  useFacet: true
}
Stupidism commented 1 year ago

@jmikrut Is it possible to turn off the draft option in list page? I think we can tolerate the inconvenience of cannot finding correct articles with some search words instead of such a slow API response.

Stupidism commented 1 year ago

I successfully work this around with a patch like this:

diff --git a/node_modules/payload/dist/admin/components/views/collections/List/index.js b/node_modules/payload/dist/admin/components/views/collections/List/index.js
index 6f99e3e..66e2319 100644
--- a/node_modules/payload/dist/admin/components/views/collections/List/index.js
+++ b/node_modules/payload/dist/admin/components/views/collections/List/index.js
@@ -69,7 +69,7 @@ const ListView = (props) => {
     (0, react_1.useEffect)(() => {
         const params = {
             depth: 0,
-            draft: 'true',
+            draft: slug === 'articles' ? undefined : 'true',
             page: undefined,
             sort: undefined,
             where: undefined,
jmikrut commented 1 year ago

Hey @Stupidism — few thoughts for you.

@DanRibbens just brought up to me the useFacet property. Are you passing mongoOptions.useFacet: false when you initialize Payload? If so, this may be how your replication is slow and ours is fast. Can you set useFacet: true?

But another thought - if you want to disable the draft=true argument, without overwriting the Payload core, you could write a beforeOperation hook that does the following:

If the above is true, you could set args.draft = false.

In the interim though, I can't figure out how to recreate this in order to fix it and unfortunately I think that means we should close this issue until we can accurately and reliably replicate the performance issue.

Let me know what you think about the above and we will continue to support accordingly!

Stupidism commented 1 year ago

I had a guess about useFacet as well, but isn't this not working for document db?

image
Stupidism commented 1 year ago

Hi @jmikrut, This is the url you can compare the speed:

https://seo-payload.stage.getjerry.com/api/articles?locale=en&depth=0&draft=true&page=1

https://seo-payload.stage.getjerry.com/api/articles?locale=en&depth=0&page=1

curl -o /dev/null -s -w 'Establish Connection: %{time_connect}s\nTTFB: %{time_starttransfer}s\nTotal: %{time_total}s\n' https://seo-payload.stage.getjerry.com/api/articles\?locale\=en\&depth\=0\&page\=1
Establish Connection: 0.000000s
TTFB: 0.000000s
Total: 5.009390s

curl -o /dev/null -s -w 'Establish Connection: %{time_connect}s\nTTFB: %{time_starttransfer}s\nTotal: %{time_total}s\n'  https://seo-payload.stage.getjerry.com/api/articles\?locale\=en\&depth\=0\&draft\=true\&page\=1
Establish Connection: 0.007475s
TTFB: 21.427764s
Total: 21.427931s
davidko604 commented 1 year ago

Hi @jmikrut, This is the url you can compare the speed:

https://seo-payload.stage.getjerry.com/api/articles?locale=en&depth=0&draft=true&page=1

https://seo-payload.stage.getjerry.com/api/articles?locale=en&depth=0&page=1

curl -o /dev/null -s -w 'Establish Connection: %{time_connect}s\nTTFB: %{time_starttransfer}s\nTotal: %{time_total}s\n' https://seo-payload.stage.getjerry.com/api/articles\?locale\=en\&depth\=0\&page\=1
Establish Connection: 0.000000s
TTFB: 0.000000s
Total: 5.009390s

curl -o /dev/null -s -w 'Establish Connection: %{time_connect}s\nTTFB: %{time_starttransfer}s\nTotal: %{time_total}s\n'  https://seo-payload.stage.getjerry.com/api/articles\?locale\=en\&depth\=0\&draft\=true\&page\=1
Establish Connection: 0.007475s
TTFB: 21.427764s
Total: 21.427931s

Hi @jmikrut ,

My team is also having similar issues with one of our collections. With drafts disabled, we get very quick collection list load time but with drafts enabled we are having long load times.

look at args.req.origin to see if the request is coming from the admin UI
look at args.id to see if there is an ID (AKA this is the find and not findByID operation). You should only disable draft=true if it's on the find operation itself

If the above is true, you could set args.draft = false.

This works, but the solution looks like it can break with a payload update in the future and our experience can be broken at some point. If you use a similar data structure to above are you able to mock or reproduce this issue?

klapec commented 1 year ago

Has anyone tested whether https://github.com/payloadcms/payload/releases/tag/v1.15.3 improves query speeds for draft-enabled collections on DocumentDB?

JarrodMFlesch commented 1 year ago

Has anyone tested whether https://github.com/payloadcms/payload/releases/tag/v1.15.3 improves query speeds for draft-enabled collections on DocumentDB?

It should have the same affect on DocumentDB, since it removes the need for the aggregation step when querying drafts!

klapec commented 1 year ago

Sounds promising! Will check it tomorrow, I'll let you know.

jmikrut commented 1 year ago

Yes, this issue has been a thorn in our side for quite some time now but I am pretty confident that this new approach will 100% solve this issue for all Mongo-like databases from here on out.

It will be defaulted in 2.0, and we wanted to get it out now as a feature flag so you guys don't have to wait 👍 thanks to Jarrod for the quick moves here!

github-actions[bot] commented 1 week ago

This issue has been automatically locked. Please open a new issue if this issue persists with any additional detail.