johnbillion / query-monitor

The developer tools panel for WordPress
https://querymonitor.com
GNU General Public License v2.0
1.59k stars 208 forks source link

Query Monitor causes my Network Admin dashboard to 404 #798

Closed MadtownLems closed 11 months ago

MadtownLems commented 1 year ago

Hi there!

When Query Monitor is network-activated, our Network Admin dashboard 404's. This isn't a huge deal because a) I can still access it if I manually add index.php to the URL b) Every other part of the Network Admin works fine, and there's not really much do do on the dashboard anyway

I have confirmed that the dashboard works totally fine without QM network activated.

It also only seems to happen in SOME of our networks, even ones using identical codebases with identical sets of plugins network activated. (Of course, there may be small differences in server configuration, htaccess, or database settings. )

When I visit https://oursite.com/wp-admin/network/ on a network that doesn't have the problem, here's some debugging info from QM's Request panel:

Request: /wp-admin/network/ Query String: none

When I visit the same dashboard page on a network that DOES have the problem, the Request looks like:

Request: wp-admin/network (note the lack of leading and trailing slashes). Matched Query: pagename=wp-admin%2Fnetwork &page= Query String: pagename=wp-admin%2Fnetwork All Matching Rewrite Rules: (.?.+?)(?:/([0-9]+))?/?$ | pagename=$matches[1]&page=$matches[2]

Everything else on these networks works as expected (as far as I know).

I'm curious if you've ever seen or heard of this before, or if you have any ideas why Query Monitor being network activated could cause this issues on just the network admin dashboard.

More than happy to provide additional information or take additional debugging steps. Thank you!!

johnbillion commented 1 year ago

Normally I'd be tempted to say this is a red herring and the problem lies elsewhere (because QM doesn't do anything to change your site, only log data from it), but I can't be sure, so I won't say that.

I'm not sure at all to be honest. I'll have a think when I have some free time.

MadtownLems commented 1 year ago

Normally I'd be tempted to say this is a red herring and the problem lies elsewhere

Totally get it. As a plugin maintainer myself, I'm all too familiar with that kind of situation. 😅

But at the end of the day, it works fine without QM, and QM being active makes it stop working, so I figured I'd at least check in with you and see if you had thoughts.

If you do think of anything, let me know - and I'll do a bit more debugging and let you know if I come up with anything on my end.

Cheers and thanks for a great plugin. Just today I used it to quickly identify and solve a different issue that would've taken infinitely longer without QM to help. ☺

MadtownLems commented 1 year ago

I'm adding to this comment because I imagine it's heavily related the 404 issue above, but happy to move to a new issue if that's helpful.

This morning we had a report that someone could not preview a draft page. Attempting to preview the draft page would simply show the site's homepage. Disabling Query Monitor for the site resolved the issue.

Perhaps this additional data point / anecdote helps give an idea as to where we might be able to look further to figure out why QM being active is causing weird issues. There's nothing in the PHP error logs, but again, very happy to try additional debugging steps to try to track things down.

I can't say for certain when these issues started, but we've been using Query Monitor for years and I don't recall these issues existing "back in the day". We're using 3.13.1 now.

Cheers

MadtownLems commented 11 months ago

A little more info here from when I once again tried to debug this further:

This morning we had a report that someone could not preview a draft page. Attempting to preview the draft page would simply show the site's homepage.

Query monitor breaking Previews only happens in our networks on sites that have custom domains. ie bar.site.edu Sites that use the default subfolder structure, such as foo.site.edu/bar/ ARE able to view their drafts as normal.

But when a site at a custom domain tries to view a preview, it only shows the homepage instead. Some relevant information from Query Monitor:

In the Request Panel, it says the Query String is "none". A functioning site's preview will say the Query String is p=234 and &preview=true.

MadtownLems commented 11 months ago

Continuing to update with new discoveries, partially for logging them and partially in case anything ever gives anyone any ideas.

This one is blowing my mind:

In trying to figure out when/how $_SERVER['QUERY_STRING'] is being seen as null (and why post previews don't work) only when Query Monitor is active, I added the following as an mu-plugin ( to run as early as possible ).

echo "This is from ext-qm-debug.php. The value of _SERVER[query_string] is: " . $_SERVER['QUERY_STRING'] . "
";

WITHOUT QUERY MONITOR When I attempt to preview a post, at the following URL: mysite.site.edu/?p=1&preview=true The output is as follows: This is from ext-qm-debug.php. The value of _SERVER[query_string] is: p=1&preview=true

WITH QUERY MONITOR Same URL, the output is as follows: This is from ext-qm-debug.php. The value of _SERVER[query_string] is:

Somehow, Query Monitor being active is blowing up the QUERY_STRING part of the $_SERVER variable.

Next I edited the main plugin file (query-monitor.php) to have, as the very first bit: echo "This is the very start of query-monitor.php. The _SERVER['query_string'] value is: " . $_SERVER['QUERY_STRING'] . "
";

And the output is: This is the very start of query-monitor.php. The _SERVER['query_string'] value is: (it SHOULD be p=1&preview=true)

So, even at the very start of Query Monitor, the $_SERVER['QUERY_STRING'] is missing. But I'm checking it there as the very first thing in the Query Monitor, theoretically before Query Monitor has had a chance to do anything yet. That's blowing my mind. Unless Query Monitor, when being activated, makes some changes behind the scenes that are impacting things? I tried digging into the Activation class, but didn't see anything that stood out to me. I also tried commenting out QM_Activation::init( FILE ); as I reactivated it, but it still broke things upon being activated.

So, I remain at a loss ¯_(ツ)_/¯

MadtownLems commented 11 months ago

Breakthrough?

After inspecting the $_SERVER variable more, I realize there's some amount of redirects happening that I'll be digging into more next week. But for now, here's a key bit:

If I put this at the start of query-monitor.php echo "This is the very start of query-monitor.php. The _SERVER['query_string'] value is: " . $_SERVER['QUERY_STRING'] . "
"; I get: This is the very start of query-monitor.php. The _SERVER['query_string'] value is:

However, if I add a "die;" after that output, I get: This is the very start of query-monitor.php. The _SERVER['query_string'] value is: p=1&preview=true

This must mean that the first page load, the query_string is set and available as expected. However, when QM is active, some number of other redirects take place that end up back at the site's home page with NO query_string! More to investigate next week. This seems like the exact same root cause as the broken network admin (that 404s), and the broken dashboards (if you don't include index.php).

johnbillion commented 11 months ago

I wonder, have you tried disabling each of the collectors in Query Monitor one by one to see if you can identify one which is causing the issue?

At the bottom of each file in https://github.com/johnbillion/query-monitor/tree/develop/collectors is where each collector is registered. You could try commenting them out one by one and see where that gets you.

johnbillion commented 11 months ago

Also, do you have a Composer autoloader in place on your site?

MadtownLems commented 11 months ago

Also, do you have a Composer autoloader in place on your site?

I do not.

I wonder, have you tried disabling each of the collectors in Query Monitor one by one to see if you can identify one which is causing the issue?

I spent a lot of time trying to this a while ago (I've been struggling with these problems for several months now), and I believe things have changed over the course of that, as our WordPress and QM versions have changed. My notes suggest there was a period when there wasn't a SINGLE collector that, when removed, made the page load - but there were combinations. For example, at one point, removing BOTH of assets_scripts and db_queries would make the page load. Later, I believe only removing assets_scripts made the page load, but now... that's not the case.

However, I think I'm really making breakthroughs now (or at least I pray I am 🙏).

The most important thing: Disabling PHP-FPM for the domain makes ALL 3 of my known Query Monitor issues go away! (Network Admin 404-ing, broken previews, and dashboards not working when 'index.php' is missing from the url.)

That has to be huge, right? I'm not sure yet what to do with that information, but it's gotta be huge? 😅

I also think analyzing these $_SERVER variables gives huge clues into the situation. As a reminder, we have two servers, and the problem only exists on one of them ('PROD'), but not TEST (despite using php-fpm on both.)

I've been analyzing the $_SERVER variables on the following 4 page loads, all doing a sample page preview: TEST without QM (always works) TEST with QM (always works) PROD without QM (always works) PROD with QM (works when php-fpm is off).

The non-working PROD (with QM active) has a ton more stuff in the $_SERVER variable. My understanding is that, when QM is active, there's some kind of error generated that is causing at least 2 internal redirects (I believe it might be infinite in the case of /wp-admin/ without index.php), but it looks like 2 in the case of a Page Preview.

Here are some interesting $_SERVER variables that are present ONLY no the non-working version with QM activated (with a few comments):

[REDIRECT_REDIRECT_STATUS] => 500
[REDIRECT_REDIRECT_SCRIPT_NAME] => /index.php
[REDIRECT_REDIRECT_REQUEST_URI] => /?p=1&preview=true
[REDIRECT_REDIRECT_QUERY_STRING] => p=1&preview=true
[REDIRECT_REDIRECT_REQUEST_METHOD] => GET
[REDIRECT_REDIRECT_SERVER_PROTOCOL] => HTTP/2.0
[REDIRECT_REDIRECT_GATEWAY_INTERFACE] => CGI/1.1
[REDIRECT_REDIRECT_SCRIPT_FILENAME] => proxy:fcgi://mysite.edu/home/mysite/public_html/index.php
[REDIRECT_REDIRECT_PATH] => /usr/local/jdk/bin:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/X11R6/bin:/usr/local/bin:/usr/X11R6/bin:/root/bin:/opt/bin

[REDIRECT_STATUS] => 500

[REDIRECT_URL] => /500.shtml

Note: My server has no /usr/local/jdk directory, so the REDIRECT_REDIRECT_PATH line is especially confusing.

Does any of that give you any more ideas, and/or things to try? (Note: I dug through other issues for anything related to FPM, and it seemed like restarting FPM often fixes things, but that is not the case for me.)

MadtownLems commented 11 months ago

After grepping around for 500.shtml, I realized that 500.shtml is being set as my 500 error handler, but it didn't exist. I added a basic one ("Internal Server Error"), and now, trying to preview a page just outputs the contents of my 500.shtml instead. That was expected.

I was hoping it would actually log some error somewhere to get more information, but I don't see anything anywhere.

And I certainly wasn't expecting my homepage to also give me the same error message (while logged in) (which never showed a problem before 😆)

MadtownLems commented 11 months ago

Updage: I THINK this is solved! I'm going to do some more robust testing tomorrow and report back with as full a report as I can, but I didn't want you to waste any time on it this evening.

MadtownLems commented 11 months ago

Alright, so I'm not 100% sure on the full details here, but it's been running great with no issues so far - and it all at least makes some amount of sense, so I'm happy to call it resolved and hope that this might help someone else in the future:

This feels like some perfect storm that led to a very hard-to-debug issue. Ultimately, it was ModSecurity that was breaking these requests.

ModSecurity: Output filter: Response body too large (over limit of 524288, total not specified).

This was painful to learn because I explicitly HAD checked the ModSecurity logs, but I'm guessing it wasn't logged because of WHM's default setting of "Only log noteworthy transactions".

Now, most Internal Server Error 500s are logged in the appropriate cPanel accounts, but these were not. I'm not sure if that's because of them being ModSecurity errors, or if Query Monitor being active gets involved in the error handling process. I think the fact that cPanel was configured with the default 500.shtml error handler for 500 errors (which didn't exist) also complicated things.

I think it DOES explain how removing a single collector wouldn't fix things, but removing various combinations of them would: removing some number of collectors would get the response body to within allowable limits!

I still haven't deduced why disabling PHP-FPM on the domain fixed it, but I was able to turn it back on after adjusting the ModSecurity config and they're playing nicely together.

So, at least for now 🤞, our network dashboard is working again, post previews are working again, and the dashboards can be accessed without /index.php again.

Thank you so much for your patience and thoughts on these tickets as I worked through this all. 🙏

johnbillion commented 11 months ago

483 will fix this by switching to client-side React rendering instead of outputting the huge amount of server-side rendered HTML, but it's still a long way off yet. One day.