wp-graphql / wp-graphql

:rocket: GraphQL API for WordPress
https://www.wpgraphql.com
GNU General Public License v3.0
3.63k stars 440 forks source link

Slow query in User Loader when querying for the public users #3178

Closed sathyapulse closed 1 month ago

sathyapulse commented 1 month ago

Description

We have encountered a customer issue where the GraphQL endpoints were returning slow responses. When debugging the problem, we found the User Loader generates a slow query to determine if the user is public.

The slow query is generated in the line /src/Data/Loader/UserLoader.php#L82.

SELECT DISTINCT
    `post_author`
FROM
    wp_2_posts
WHERE
    (
        (
            post_type = 'post' AND(post_status = 'publish')
        ) OR(
            post_type = 'page' AND(post_status = 'publish')
        ) OR(
            post_type = 'l-page' AND(post_status = 'publish')
        ) OR(
            post_type = 'place' AND(post_status = 'publish')
        )
    ) AND `post_author` IN(1, 2, 3, 4, 5)
LIMIT 5;
EXPLAIN SELECT DISTINCT post_author FROM wp_2_posts WHERE ( ( post_type = 'post' AND(post_status = 'publish') ) OR( post_type = 'page' AND(post_status = 'publish') ) OR( post_type = 'l-page' AND(post_status = 'publish') ) OR( post_type = 'place' AND(post_status = 'publish') ) ) AND post_author IN(1, 2, 3, 4, 5) LIMIT 5;

+----+-------------+------------+------------+-------+----------------------------------------------------------------------------------------------+-------------+---------+------+--------+----------+-----------------------------------------------------+
| id | select_type | table      | partitions | type  | possible_keys                                                                                | key         | key_len | ref  | rows   | filtered | Extra                                               |
+----+-------------+------------+------------+-------+----------------------------------------------------------------------------------------------+-------------+---------+------+--------+----------+-----------------------------------------------------+
|  1 | SIMPLE      | wp_2_posts | NULL       | range | post_author,yoast2_index,posts_mime_type_status_index,status_type_sort_date,type_status_date | post_author | 8       | NULL | 258277 |     8.17 | Using index condition; Using where; Using temporary |
+----+-------------+------------+------------+-------+----------------------------------------------------------------------------------------------+-------------+---------+------+--------+----------+-----------------------------------------------------+

In this case, the query scans 258K rows and takes 8-10 seconds to execute.

The slow query could be optimised with the query below, which returns results in 2 milliseconds.

SELECT DISTINCT
    `wp_users`.ID
FROM
    wp_2_posts
INNER JOIN wp_users ON wp_2_posts.post_author = wp_users.ID
WHERE
    (
        (
            post_type = 'post' AND(post_status = 'publish')
        ) OR(
            post_type = 'page' AND(post_status = 'publish')
        ) OR(
            post_type = 'l-page' AND(post_status = 'publish')
        ) OR(
            post_type = 'place' AND(post_status = 'publish')
        )
    ) AND `post_author` IN(1, 2, 3, 4, 5);
EXPLAIN SELECT DISTINCT wp_users.ID FROM wp_2_posts INNER JOIN wp_users ON wp_2_posts.post_author = wp_users.ID WHERE ( ( post_type = 'post' AND(post_status = 'publish') ) OR( post_type = 'page' AND(post_status = 'publish') ) OR( post_type = 'l-page' AND(post_status = 'publish') ) OR( post_type = 'place' AND(post_status = 'publish') ) ) AND post_author IN(1, 2, 3, 4, 5);
+----+-------------+------------+------------+-------+----------------------------------------------------------------------------------------------+-------------+---------+---------------------------------+-------+----------+-------------------------------------------+
| id | select_type | table      | partitions | type  | possible_keys                                                                                | key         | key_len | ref                             | rows  | filtered | Extra                                     |
+----+-------------+------------+------------+-------+----------------------------------------------------------------------------------------------+-------------+---------+---------------------------------+-------+----------+-------------------------------------------+
|  1 | SIMPLE      | wp_users   | NULL       | range | PRIMARY,user_login_key,user_nicename,user_email                                              | PRIMARY     | 8       | NULL                            |     5 |   100.00 | Using where; Using index; Using temporary |
|  1 | SIMPLE      | wp_2_posts | NULL       | ref   | post_author,yoast2_index,posts_mime_type_status_index,status_type_sort_date,type_status_date | post_author | 8       | db190.wp_users.ID               | 14054 |     8.17 | Using where; Distinct                     |
+----+-------------+------------+------------+-------+----------------------------------------------------------------------------------------------+-------------+---------+---------------------------------+-------+----------+-------------------------------------------+

If you look at the EXPLAIN, it scans only 14K rows to fetch the results.

The current query returns the user ID even when the user doesn't exist in the database. The new query returns the user ID only when the user exists in the database. I also removed the unnecessary LIMIT from the query.

Could you please patch the plugin with the optimised query? I'm happy to help with testing or answer any questions.

Steps to reproduce

The issue can be reproduced from the GraphiQL IDE with the query below.

query PostCards {
  posts(first: 5) {
    nodes {
      title
    }
  }
}

Additional context

No response

WPGraphQL Version

1.22.1

WordPress Version

6.4.5

PHP Version

8.1

Additional environment details

I tested this in the older WordPress and plugin versions, but I'm sure it can also be reproducible in the latest versions.

Please confirm that you have searched existing issues in the repo.

Please confirm that you have disabled ALL plugins except for WPGraphQL.

justlevine commented 1 month ago

The slow query is generated in the line /src/Data/Loader/UserLoader.php#L82.

Wondering if we even need to do a direct query here over something built-in (and possibly pre-cached), and if we do need our own why we're not caching it ourselves 🤔

jasonbahl commented 1 month ago

@sathyapulse thanks for reporting! I know we had explored this issue in the past and had made some progress, but perhaps not enough progress.

Going to link some things for reference:

jasonbahl commented 1 month ago

@sathyapulse I've opened this PR: https://github.com/wp-graphql/wp-graphql/pull/3183

Would love to hear from you after you have a chance to test it to see how it impacts your use cases.