putyourlightson / craft-blitz

Intelligent static page caching for creating lightning-fast sites with Craft CMS.
https://putyourlightson.com/plugins/blitz
Other
149 stars 36 forks source link

4.x ?token query param is cached even if "Query String Caching" is OFF #467

Closed Romanavr closed 1 year ago

Romanavr commented 1 year ago

After updating a number of sites to craft 4.x, I noticed a problem that while job on warming up the cache, the job itself constantly fails and starts again I canceled the job and tried to warmup the cache through the CLI In the logs(log file from the blitz) after the warming is done, I noticed that pages with a token parameter (?token=XXXXX) appeared in the log file, while the cache warming up for query params was turned off [2023-01-30 23:33:43] Inactivity timeout exceeded, more than 10000 ms elapsed from last data received [https://site.com/myurl?token=Eni7rmSnSoLVranG3QVgx3ftG9tCyy-A] Enabling query string caching and adding a ?token to the exception solves the problem, but I would like to understand why this happens at all

I checked the database, there is no URL with this parameter in the blitz cache and site elements tables. As far as I understand, these pages come from craft(I mean token param itself), I don’t fully understand what this token does. Maybe it's somehow related to https://github.com/craftcms/cms/issues/12458 / https://github.com/craftcms/cms/issues/5698 ?

PHP 8.1.14 Craft Pro 4.3.5 Blitz 4.2.3

bencroker commented 1 year ago

Cache warmers have been replaced by generators in Blitz 4, which do use tokens for telling Craft that a URL is a generation request, so there's nothing to be concerned about there. You should not have to add an exception for token parameters, as Blitz already does that.

Can you help me understand what the current issue is? Be sure to check the queue.log and blitz.log log files for signs of errors.

Romanavr commented 1 year ago

Cache warmers have been replaced by generators in Blitz 4, which do use tokens for telling Craft that a URL is a generation request, so there's nothing to be concerned about there. You should not have to add an exception for token parameters, as Blitz already does that.

Can you help me understand what the current issue is? Be sure to check the queue.log and blitz.log log files for signs of errors.

Thanks for the reply.

This is from queue.log

2023-01-30 18:42:21 [queue.ERROR] [blitz] Inactivity timeout exceeded, more than 10000 ms elapsed from last data received [https://mysite.com/flip-your-standard?token=K_VxU-H72Vxqzq3NjHJIMhLsKM9PGh4c] {"memory":62770936}
2023-01-30 18:42:21 [queue.ERROR] [blitz] Allowed transfer timeout exceeded, took longer than 10000 ms [https://mysite.com/suburl?token=K_VxU-H72Vxqzq3NjHJIMhLsKM9PGh4c] {"memory":62854768}
2023-01-30 18:42:21 [queue.ERROR] [blitz] Allowed transfer timeout exceeded, took longer than 10000 ms [https://mysite.com/suburl/a-diet-we-can-all-agree-on?token=K_VxU-H72Vxqzq3NjHJIMhLsKM9PGh4c] {"memory":62844248}
2023-01-30 18:42:21 [queue.ERROR] [blitz] Allowed transfer timeout exceeded, took longer than 10000 ms [https://mysite.com/suburl/my-site-breakfast-challenge-7-inspirerende-zomerse-recepten-voor-een-week-vegan-ontbijten?token=K_VxU-H72Vxqzq3NjHJIMhLsKM9PGh4c] {"memory":62831184}
2023-01-30 18:42:21 [queue.ERROR] [blitz] Inactivity timeout exceeded, more than 10000 ms elapsed from last data received [https://mysite.com/suburl/test?token=K_VxU-H72Vxqzq3NjHJIMhLsKM9PGh4c] {"memory":62813440}

https://github.com/putyourlightson/craft-blitz/blob/4.2.3/src/services/CacheRequestService.php#L80 Here blitz checks for token param? If yes, then probably It's necessary to add || $request->getToken() ?

bencroker commented 1 year ago

These timeouts are happening during the cache generation process only. Can you check whether those pages actually load within 10 seconds, which is the max time allowed per generation request.

Romanavr commented 1 year ago

These timeouts are happening during the cache generation process only. Can you check whether those pages actually load within 10 seconds, which is the max time allowed per generation request.

Yes, I can load these pages with a token through the browser within 10 seconds (literally takes 1 second), but the page content with token param is empty (as I understand it specically made for cache generators?).

bencroker commented 1 year ago

Hmm, which cache generation driver are you using and with what settings? And is the timeout happening only for the 5 pages shown above, or for more pages?

Romanavr commented 1 year ago

Hmm, which cache generation driver are you using and with what settings? And is the timeout happening only for the 5 pages shown above, or for more pages?

Blitz File Storage. HTTP generator(default). Reverse Proxy Purger - None. Do not cache URLs with query strings. No remote deployer. The timeout occurs only for pages with a token, from log I can see ~20-25 such pages.

bencroker commented 1 year ago

The timeout occurs only for pages with a token, from log I can see ~20-25 pages.

Remember, a token is always used in cache generation, so it is failing for those 20-25 pages due to a timeout.

What concurrency value is selected under Cache Generation?

Romanavr commented 1 year ago

What concurrency value is selected under Cache Generation?

3 concurrent requests. No custom URLs.

bencroker commented 1 year ago

In that case I'm not sure what the issue is. Do you have a CDN or firewall that might be blocking repeated HTTP requests from the server? Perhaps trying the Local Generator might make sense too.

Romanavr commented 1 year ago

In that case I'm not sure what the issue is. Do you have a CDN or firewall that might be blocking repeated HTTP requests from the server? Perhaps trying the Local Generator might make sense too.

Thank you very much for the fast replies. I'll try something and let you know.

bencroker commented 1 year ago

Yes, it looks copied directly from the docs, which are correct.

Romanavr commented 1 year ago

After some research, I figured out a potential issue that causes my site to go down on cache generation.

Maybe I'm wrong, but here's what I found out: As you mentioned and as I see in the source files, generator/warmer work differently in craft 4 rather than in craft 3. From the logs, I can see that ampphp drops the connection after some timeout (10s?), however, under the hood, PHP continues to execute the script(ignore_user_abort=off in my case), and at the same moment the generator continues to warm up the URLs, triggering more PHP execution which eventually overload the server and it crashes (in the nginx access.log I can see that after 499 code, the server responds with 503, the DB crashes)

Did Gizzly work differently? Gizzly waited for a 200 server response or timeout, but never dropped the connection himself after certain time like it works with the amphp?

bencroker commented 1 year ago

You're right that the new HTTP Generator, which uses AMPHP, uses a timeout of 10 seconds, whereas the old Guzzle Warmer did not enforce a timeout. I could make the value configurable, but it seems wise to set a timeout. Your server should enforce its own timeout (via the max_execution_time setting), but if requests are really take that long to fulfill then I can see how the requests from the generator could build up fast and possibly overwhelm the server.

A few things you could try are:

  1. Reduce the HTTP Generator concurrency.
  2. Use the Local Generator instead.
  3. Figure out why requests are taking so long to be served (recommended!).
Romanavr commented 1 year ago

You're right that the new HTTP Generator, which uses AMPHP, uses a timeout of 10 seconds, whereas the old Guzzle Warmer did not enforce a timeout. I could make the value configurable, but it seems wise to set a timeout. Your server should enforce its own timeout (via the max_execution_time setting), but if requests are really take that long to fulfill then I can see how the requests from the generator could build up fast and possibly overwhelm the server.

A few things you could try are:

1. Reduce the HTTP Generator concurrency.

2. Use the Local Generator instead.

3. Figure out why requests are taking so long to be served (recommended!).

Thank you for the detailed answer! If possible, please add such option so It'll work like under the craft 3.x. Some pages take too long to load due to the scale of the server (this is we're actually using blitz). Could you explain what is the main disadvantage of a local generator? Or where can I read about it? I just want to understand the reason why it is worth using/not using it.

bencroker commented 1 year ago

If some pages are taking over 10 seconds to respond then it sounds like there might be opportunities for optimising using eager-loading. Be sure to check out the Blitz Hints utility if you haven't yet.

The Local Generator uses PHP processes to mock requests rather than making external HTTP requests. I imagine it might solve your issues.

Romanavr commented 1 year ago

@bencroker Thanks for all your advices & tips. If you get a chance, please make an option to disable automatic self-timeout for the HTTP generator (ampphp). In my pont of view, even It make sense: If I have heavy unoptimized pages, Craft requires (not strictly as I know from experience with fortrabbit) max_execution_time to be at least 120 seconds. https://i.imgur.com/dwYAYbc.png So even I have a few unoptimized pages(response time >10s) and a server that doesn't scale well this may cause a RIP for the server.

In any case, I really appreciate your help, thanks again.

bencroker commented 1 year ago

I added a timeout config setting to the HTTP Generator in https://github.com/putyourlightson/craft-blitz/commit/8d383e813b3887cba99d8104266faca6a5d6822a, for the next release.

return [
    '*' => [
        // The generator settings.
        'cacheGeneratorSettings' => [
            'concurrency' => 3,
            // Override the default value of 10000 milliseconds
            'timeout' => 30000,
        ],
    ],
];
bencroker commented 1 year ago

FYI I set the new default timeout to 120 seconds, which is the minimum PHP max execution time that Craft requires.

Romanavr commented 1 year ago

FYI I set the new default timeout to 120 seconds, which is the minimum PHP max execution time that Craft requires.

Make sense, thanks!

bencroker commented 1 year ago

Just released in version 4.3.0.

holmey commented 10 months ago

I added a timeout config setting to the HTTP Generator in 8d383e8, for the next release.

return [
    '*' => [
        // The generator settings.
        'cacheGeneratorSettings' => [
            'concurrency' => 3,
            // Override the default value of 10000 milliseconds
            'timeout' => 30000,
        ],
    ],
];

Should this feature be documented in the default configuration file? https://github.com/putyourlightson/craft-blitz/blob/e464f25cb01943cbaff13feed3c1fbe36ceb8f12/src/config.php#L75

bencroker commented 10 months ago

@holmey not necessarily, as the value here is meant as an example rather than documentation, and it is unlikely that the majority of people will ever have to change it.