putyourlightson / craft-blitz

Intelligent static page caching for creating lightning-fast sites with Craft CMS.
https://putyourlightson.com/plugins/blitz
Other
147 stars 35 forks source link

Generating Blitz cache jobs gets stuck at 100% #665

Closed naboo closed 1 week ago

naboo commented 1 month ago

Bug Report

My "Generating Blitz cache" jobs gets stuck at 100%. At first it was always the #1 job in the queue but now even random jobs in the middle of the generation gets stuck. Here's a screenshot when 2 jobs have gotten stuck (concurrency is set to "1").

bild

It usually happens during deployment. I've tried changing the deployment steps from:

# Clear caches
php craft invalidate-tags/all
php craft blitz/cache/refresh --queue=1

.. to:

# Clear caches
php craft invalidate-tags/all
php craft blitz/cache/clear
php craft blitz/cache/flush
php craft blitz/cache/generate --queue=1

.. but it doesn't do any difference. It usually works when I manually click the steps from Craft CP. I've experimented with restarting and not restarting the deamon which runs the jobs on the server but same result both ways. Deamon runs with:

/usr/bin/nice -n 10 /usr/bin/php /home/craft/www.domain.com/craft queue/listen --verbose

I don't really see anything in the logs, either Blitz and/or Craft. No server errors. This is from the database:

bild

Diagnostics Report

Host www.ploi.io

Craft CMS "craftcms/cms": "4.8.9" "putyourlightson/craft-blitz": "4.16.2",

bencroker commented 1 month ago

It looks like multiple jobs are being executed concurrently, which is not ideal. Is the daemon configured to run at most one process at a time?

Read more about my recommendations for queue runners at https://putyourlightson.com/articles/queue-runners-and-custom-queues-in-craft-cms#queue-runners

naboo commented 1 month ago

The runner is set to only run one (1) process at at time:

bild

So the task executes one job at a time (starts with job 1 of 18) then job 1 gets stuck at 100%. After a couple of seconds job 2 starts. It's only when the jobs gets stuck at 100% another job is started and never more than 1 job is being run at the same time.

naboo commented 1 month ago

Some more context: The jobs that gets stuck re-starts after a while (I don't know how long...) and ultimately finishes. But then they are being run all over again so the entire batch is being processed again. 1 of 18, 2 of 18, 3 of 18....

Super wierd :)

bencroker commented 1 month ago

Are you trying to cancel jobs via the UI, by any chance? Currently, cancelling a batched job has little or no effect in Craft, see https://github.com/craftcms/cms/discussions/14986 for more details.

naboo commented 1 month ago

I'm not touching the queue at all. I just let them be. So basically when a job gets stuck at 100% it'll spawn the next "X of Y" and the first stuck job just sits there. When all jobs have been processed something happens (if the job just restarts or expires and restarts as "attempt = 2" I don't know) and the job starts to process again from the beginning.

bencroker commented 1 month ago

Spawned batch jobs getting a higher priority than the initial job, to help ensure batches are completed one at a time. So that helps explain why batch 1 is only restarted when all other batches are complete. However, why batch 1 is not being removed from the queue when it reaches 100% makes no sense to me. Neither does the fact that you have multiple reserved queue jobs at the same time.

Can you try increasing the TTL of generate cache jobs using the queueJobTtr setting, to reduce the probability of jobs timing out?

naboo commented 1 month ago

I'll try that @bencroker and report back. I'll set it to something super high. I have it at 2400 today so it's pretty high but let me try even higher.

naboo commented 1 month ago

Here's an update Ben. So I set 'queueJobTtr' => 3600, and 'driverJobBatchSize' => 50,. Still getting the first job stuck at 100% not clearing. Can it be a Craft CMS queue related issue and not Blitz related? Nothing else gets struck though.

bild

Thanks

bencroker commented 1 month ago

I’m not sure, it could just as easily be a Craft issue. Can you update to the latest versions of Craft and Blitz to help rule that out?

naboo commented 1 month ago

I'm on the latest 4.x branch right now for Craft. This is a pretty big site so the upgrade to 5.x will require some testing to get out but it's on the roadmap for sure.

It's also very strange that it goes through "the second time" when it restarts. Not just the first.

Everything also works as it should when Blitz cleares entries from inside the CP on element re-save etc.

Do we see the upgrade as the next logical step or anything else we might check before? If so we'll have to put this on pause for a little bit until we upgrade.

bencroker commented 1 month ago

If you’re on the latest v4 (4.18.1), which is still being maintained, then it should be all good. At this stage, I’m tempted to say that the next logical step might be to contact Craft’s official support team, as I can’t see why this is happening and nobody else using Blitz has encountered this issue, as far as I’m aware.

russback commented 1 week ago

Just adding that I'm seeing this on Craft 5.2.3, Blitz 5.3.2 - getting stuck on 100% on the first page of the batch.

This is when generating cache, and there is only one Blitz job in the queue.

Is there anything else I can provide here?

Application Info

Installed Plugins

Loaded Modules

Blitz Plugin Settings

{
    "debug": false,
    "hintsEnabled": false,
    "cachingEnabled": true,
    "refreshCacheEnabled": true,
    "refreshMode": 1,
    "includedUriPatterns": [
        {
            "siteId": "",
            "uriPattern": ".*"
        }
    ],
    "excludedUriPatterns": [],
    "cacheStorageType": "putyourlightson\\blitz\\drivers\\storage\\FileStorage",
    "cacheStorageSettings": {
        "folderPath": "@webroot\/cache\/blitz",
        "compressCachedValues": true,
        "countCachedFiles": true
    },
    "cacheStorageTypes": [],
    "cacheGeneratorType": "putyourlightson\\blitz\\drivers\\generators\\HttpGenerator",
    "cacheGeneratorSettings": {
        "concurrency": 3
    },
    "cacheGeneratorTypes": [],
    "customSiteUris": [],
    "cachePurgerType": "putyourlightson\\blitz\\drivers\\purgers\\DummyPurger",
    "cachePurgerSettings": [],
    "cachePurgerTypes": [],
    "deployerType": "putyourlightson\\blitz\\drivers\\deployers\\DummyDeployer",
    "deployerSettings": [],
    "deployerTypes": [],
    "ssiEnabled": false,
    "ssiTagFormat": "<!--#include virtual=\"{uri}\" -->",
    "detectSsiEnabled": true,
    "esiEnabled": false,
    "queryStringCaching": 1,
    "includedQueryStringParams": [],
    "excludedQueryStringParams": [],
    "apiKey": "",
    "generatePagesWithQueryStringParams": true,
    "purgeAssetImagesWhenChanged": true,
    "refreshCacheAutomaticallyForGlobals": true,
    "refreshCacheWhenElementMovedInStructure": true,
    "refreshCacheWhenElementSavedUnchanged": false,
    "refreshCacheWhenElementSavedNotLive": false,
    "cacheNonHtmlResponses": false,
    "trackElements": true,
    "trackElementQueries": true,
    "excludedTrackedElementQueryParams": [],
    "cacheDuration": null,
    "nonCacheableElementTypes": [],
    "sourceIdAttributes": [],
    "liveStatuses": [],
    "integrations": [
        "putyourlightson\\blitz\\drivers\\integrations\\CommerceIntegration",
        "putyourlightson\\blitz\\drivers\\integrations\\FeedMeIntegration",
        "putyourlightson\\blitz\\drivers\\integrations\\SeomaticIntegration"
    ],
    "defaultCacheControlHeader": "no-store",
    "cacheControlHeader": "public, max-age=31536000",
    "cacheControlHeaderExpired": "public, max-age=5",
    "sendPoweredByHeader": true,
    "outputComments": true,
    "refreshCacheJobPriority": 1,
    "driverJobBatchSize": 100,
    "driverJobPriority": 0,
    "queueJobTtr": 900,
    "maxRetryAttempts": 10,
    "maxUriLength": 255,
    "mutexTimeout": 1,
    "commands": [],
    "injectScriptEvent": "DOMContentLoaded",
    "injectScriptPosition": 3
}

Recommendations

Site Tracking

laneigemud1990 commented 1 week ago

Hi all, I had a conversation with Ben about this on the Craft discord and it turned out that in the blitz config file, there is a description which says the following for the driverJobPriority setting:

The priority to give driver jobs (the lower the number, the higher the priority). Set to null to inherit the default priority.

However, this is wrong. It needs to be an integer. Ben is updating this on the next release.

Stepping through the code, I saw that with a priority level of 0, there is some logic with then takes this down to -1 and on trying to write it into the db, it throws an error.

bencroker commented 1 week ago

I ended up fixing the issue that a null priority was incorrectly interpreted as 0 and have released in 4.18.3 and 5.3.3. Thanks for the reports and let me know if any issues remain!