putyourlightson / craft-blitz

Intelligent static page caching for creating lightning-fast sites with Craft CMS.
https://putyourlightson.com/plugins/blitz
Other
149 stars 36 forks source link

Deploying only adds, never removes #407

Closed JeanLucEsser closed 2 years ago

JeanLucEsser commented 2 years ago

Filing this as a question as I may have misunderstood something or misconfigured something.

Why is git deploy only adding new cache and never removes any files?

Let's say I create entry A: Cache gets generated, and deployed via a static folder to git and Netlify. I followed your blog post to do it and it works great.

Now let's say I disable or delete entry A: Cache gets updated, but files not belonging to the cache anymore don't get removed from the static folder thus never removing the entry from git. Also, clearing the cache does not clear the static folder when deploying.

Is that normal behavior, what did I miss?

[Craft 4.0.2 / Blitz 4.1.1]

bencroker commented 2 years ago

Just tested locally and this is working as expected when both disabling and deleting entries, with the Refresh Mode set to Clear the cache and regenerate in a queue job.

JeanLucEsser commented 2 years ago

Ok, let's walk through it:

Craft 4.0.2 / Blitz 4.1.2 / Local environment / Starting with a clear cache Local Generator Clear the cache and regenerate in a queue job

@root/web/cache is empty @root/static has _redirects and is on sync with my GitHub

Creating blog entry test-01. @root/web/cache has blog/test-01/index.html @root/static has _redirects and blog/test-01/index.html

Creating blog entry test-02. @root/web/cache has blog/test-01/index.html and blog/test-02/index.html @root/static has _redirects and blog/test-01/index.html and blog/test-02/index.html

Removing blog entry test-01. @root/web/cache has blog/test-01/index.html and blog/test-02/index.html @root/static has _redirects and blog/test-01/index.html and blog/test-02/index.html Entry has not been deleted neither from cache nor from static.

Manually clear cache then generate cache (via utility). @root/web/cache has blog/test-02/index.html @root/static has _redirects and blog/test-01/index.html and blog/test-02/index.html

Manually deploy (via utility). @root/web/cache has blog/test-02/index.html @root/static has _redirects and blog/test-01/index.html and blog/test-02/index.html Entry test-01 is still in the deployment process.

So I would say I have an issue when removing an entry where the cache is not updated accordingly. And I have a second issue where manually deploying does not reflect the actual cache.

bencroker commented 2 years ago

Removing blog entry test-01.

This step should result in @root/web/cache containing only blog/test-01/index.html. Can you please define what you mean by "removing" here? And are you seeing the Refreshing Blitz Cache queue job executing?

Entry test-01 is still in the deployment process.

How are you syncing files from @root/web/cache to @root/static?

JeanLucEsser commented 2 years ago

By removing I mean deleting, but disabling an entry has the same result. Yes I'm seeing the queue job executing.

How are you syncing files from @root/web/cache to @root/static?

I'm not syncing files from @root/web/cache to @root/static, the plugin does it when new cache is generated, so I expect it to do the same when files are removed from cache. @root/static is my git local repository for git deployment.

Seems like you can't replicate my issue. This is a very basic setup with an empty craft 4 install.

bencroker commented 2 years ago

Can you perhaps share a screen recording, including the section setup?

Gnative commented 2 years ago

I think what you are describing is something I came across yesterday.

If you look at the Blitz cache file you'll see that the cached file is removed when you delete an entry.

With Remote Deployment Blitz copies over the cache files into your local repo folder and then commits these changes. I don't think Blitz actually deletes any files from this local repo folder as its only copying them over.

I found if I add a script to delete all files in the local repo folder, via Commands Before in Remote Deployment tab, then blitz copies over the cache files, if a file is missing it then gets marked as deleted in the commit.

Hope that makes sense.

bencroker commented 2 years ago

Blitz should delete files that reference URIs whose cache has been cleared. https://github.com/putyourlightson/craft-blitz/blob/a3571ae82254fd48fce7f6ecca9b24ff0adbc6f2/src/drivers/deployers/GitDeployer.php#L362-L381

I wonder if this is related to https://github.com/putyourlightson/craft-blitz/issues/384. Are you by any chance using the Async Queue plugin or multiple queue runners?

JeanLucEsser commented 2 years ago

As I said I have two issues at the moment, cache files not being removed when entry is deleted and git repo not reflecting actual cached files. I need more time to investigate and eventually make a screen recording as Ben suggested. Will try to address it this WE.

Are you by any chance using the Async Queue plugin or multiple queue runners

Nope.

bencroker commented 2 years ago

Any update on this @JeanLucEsser?

JeanLucEsser commented 2 years ago

I should be able to test this more thoroughly this WE. It has to be on my side as I seem to be the only one experiencing this. Will let you know as soon as I can.

JeanLucEsser commented 2 years ago

Ok so first things first.

I disabled deployment. I have only two sections, my home page (single) and a blog. I have no entries in the blog. Cleared all caches.

Running blitz/cache/refresh Home is cached as web/cache/home/index.html. Table blitz_caches has 1 line referencing home. Table blitz_elementcaches has 2 lines related to the cache, one entry for home and one svg asset used in home.

Adding a blog entry. Entry is cached as web/cache/blog/my-entry/index.html. Table blitz_caches has 1 new line referencing the blog entry. Table blitz_elementcaches has 2 new lines, one referencing the blog entry and one for the same svg asset than in my home although this asset is not used anywhere in the blog entry but as it's pulled from globals, I guess that's why.

So far so good.

Deleting the blog entry. Entry is still cached as web/cache/blog/my-entry/index.html, it hasn't been deleted. Table blitz_caches has updated the line referencing the blog entry with expiryDate set to the exact time of deletion. Table blitz_elementcaches has not been touched.

Trying to run blitz/cache/refresh-expired. Nothing has changed except expiryDate which was updated to the time of running the command.

Trying to run blitz/cache/refresh. Returns an error:

Generated 1 of 2 total possible pages. To see why some pages were not cached, enable the `debug` config setting and then open the `storage/logs/blitz.log` file.

And the corresponding log:

[2022-06-05 17:18:52] 404 error: Not Found [https://covivance.frb.io/blog/my-blog-entry?token=DtdUTtuKZOK53GtQnS18-uggYUAaQAXh]

So it is still trying to generate the deleted blog entry!

To be sure that this was not related in any way to nitro and my local docker setup, I ran all those tests in a fortrabbit production ready environment, with Blitz default settings, HttpGenerator and concurrency set to 1. I'm not using async-queue either.

bencroker commented 2 years ago

Entry is still cached as web/cache/blog/my-entry/index.html, it hasn't been deleted. Table blitz_caches has updated the line referencing the blog entry with expiryDate set to the exact time of deletion.

It sounds like your Refresh Mode setting is set to Expire the cache..., rather than Clear the cache... as previously stated, can you please double-check the value of this setting?

JeanLucEsser commented 2 years ago

You know those times when you wish you could just disappear or rewind time... I'm so sorry, I just want to burry my head deep in the sand somewhere.

I don't know why or how, but you're right, the Refresh Mode setting was wrongly set. At least for my latest tests. I'm gonna run everything again with the correct setting as the initial issue was the deployment, not the cache and I'm pretty sure when I started this issue the setting was correct.

What I can tell you for now is that the cached files are correctly added / deleted.

On to the git deployment part now. Will keep you posted asap. And I promise to double check every step.

--

I do have a couple questions though if I may. My setting was Expire the cache and regenerate in a queue job. What exactly is happening in the queue job then and when? How is it supposed to handle deleted entries?

Bonus question: If we set it to Clear or Expire the cache, regenerate manually or organically. Which commands are we supposed to run manually and in which order?

I'm kind of new to all the advanced caching thing and I probably should read some more about the stale-while-revalidate pattern, I think I'm missing something to really understand the relation between expiring the cache and deleting the actual files.

bencroker commented 2 years ago

My setting was Expire the cache and regenerate in a queue job. What exactly is happening in the queue job then and when? How is it supposed to handle deleted entries?

The cache is regenerated in the background and overwritten.

If we set it to Clear or Expire the cache, regenerate manually or organically. Which commands are we supposed to run manually and in which order?

Running the commands is optional. Use blitz/cache/refresh-expired to refresh expired cache only, or use blitz/cache/generate to regenerate the entire cache.

JeanLucEsser commented 2 years ago

The cache is regenerated in the background and overwritten.

Yes I figured that much. My question is related to deleted entries (and disabled ones). I can’t see when and how those files are deleted from the cache when using Expire… what am I missing?

Plus just to make sure, what exactly does refresh expired cache mean? The way I understand it is it is supposed to delete all expired cache, right?

bencroker commented 2 years ago

My question is related to deleted entries (and disabled ones). I can’t see when and how those files are deleted from the cache when using Expire… what am I missing?

When blitz/cache/refresh-expired is run.

Plus just to make sure, what exactly does refresh expired cache mean? The way I understand it is it is supposed to delete all expired cache, right?

No, it marks the cache as expired but does not delete it until the command above is run.

JeanLucEsser commented 2 years ago

When blitz/cache/refresh-expired is run.

Ok so I'm lost. Take my latest tests where I mistakenly used the wrong refresh setting: https://github.com/putyourlightson/craft-blitz/issues/407#issuecomment-1146834144.

When I ran blitz/cache/refresh-expired, the cached file from the deleted blog entry should have been deleted (according to the expiryDate) instead of updating the caches table expiryDate again, am I wrong?

Plus I was using the wrong setting but I was still using the queue (Expire the cache and regenerate in a queue job), so the queue job should have taken care of deleting the cached file after expiring the cached entry, right?

No, it marks the cache as expired but does not delete it until the command above is run.

Yes I meant the command.

bencroker commented 2 years ago

That's all correct.

JeanLucEsser commented 2 years ago

Ok so do you think something's wrong on my side or could it be an issue. And if it is, may be better to wait until it is resolved before running new tests on the git deployment issue I encountered initially.

bencroker commented 2 years ago

Please test, as already discussed, using a Clear the cache ... refresh mode.

JeanLucEsser commented 2 years ago

With Clear the cache... the cached files are added and deleted correctly. You mean testing the git deploy in this configuration. Ok, will do it now.

JeanLucEsser commented 2 years ago

So with Clear the cache... deployment works as it should in local (with LocalGenerator) and on my server (with HttpGenerator). I can't believe I used the wrong setting from the start, sorry about that.

Still, it should work with Expire the cache and generate in a queue job all the same which it's not. I'm not using this setting for now so it's not an issue for me. But if you need me to run some more tests on this configuration I'll be happy to.

2 last remarks:

In my nitro (docker) local environment (LocalGenerator), /index.html is never generated (only /home/index.html is) whereas on my server (HttpGenerator), /index.html is generated along with /home/index.html.

When deleting an entry Clear the cache..., the cached file is correctly deleted but the DB is not updated to reflect that (unless we do a full refresh). I would have expected to see the corresponding line in the blitz_caches table to be deleted as well to reflect the true status of the cache. I guess it is expected behavior but I was wondering why. Maybe to optimize performance by not making unnecessary queries?

bencroker commented 2 years ago

Glad to hear it's working as expected!

When deleting an entry Clear the cache..., the cached file is correctly deleted but the DB is not updated to reflect that (unless we do a full refresh). I would have expected to see the corresponding line in the blitz_caches table to be deleted as well to reflect the true status of the cache.

The database is not a one-to-one reflection of cached files, it's a bit more sophisticated than that. Fortunately, it's not something you need to care too much about as a user.