Open josdejong opened 7 months ago
Hey @josdejong what's your URL path structure look like? Are your blog or application pages under a separate path? eg /blog/some-page
or /app/some-page
?
This library is a pretty thin wrapper over workbox so we can look over there for some inspiration too.
Thanks for your reply. The blog is indeed under a separate path. It boils down to:
/ (main application)
/blog/article1
/blog/article2
/blog/article3
The blog articles have images, but all images end up flattened in a folder ./dist/_astro/
after building with AstroJS.
I figured out that I can use registration.autoRegister = false
and manually add code only to the web application at /
, that's great, so that solves my challenge (2).
For challenge (1) I would need to specify workbox.globIgnores
after Astro has built the website, and before astrojs-service-worker
creates the service worker. For the time being I've copied and adjusted the code of astrojs-service-worker
so that it get's the list with all resources in _astro
, and then checks index.html
to see which resources are used with a simple String.includes(resourceName)
, recursing into js/css files used by index.html
.
Would you be interested in either:
export interface ServiceWorkerConfig {
// ...
workbox?: InjectManifestOptions | GenerateSWOptions | (dir: URL) => InjectManifestOptions | GenerateSWOptions;
}
globIgnores
that works with AstroJS? (more work)I can work out a PR for either if you want.
How to configure a service-worker such that it only runs for the web application, and not on blog pages?
The Service Worker API supports the opposite use case, restricting the service worker to a particular path with scope, but workbox has a configuration option navigateFallbackDenylist that I believe solves this for you without you needing to take over manual registration -- LMK if that works.
The blog articles have images, but all images end up flattened in a folder ./dist/_astro/ after building with AstroJS.
Other than images, does globIgnores
work for your use case? Because you can locate all source code for the blog under a blog
directory and ignore that directory?
It's unfortunate we can't simply create a blog
directory in images because Astro's build process loses the directory nesting. I'm inclined to see if Astro would be open to changing that. Another approach would be "tagging" images for exclusion, eg ${image_name}.blog.png
and then using the blog
"tag" to target assets for exclusion using globIgnores
.
Hm, I haven't used offlineFallback
options before, but as far as I understand this is meant as a fallback page for non-cached pages in case there is no internet connection, showing something like "Could not connect - Retry". I don't think it applies to this case of caching a part of the website only.
Indeed globIgnores
itself works like a charm (and is meant for this). The issue is that we cannot use a pattern like /blog/**/*
since AstroJS flattens resources like images into the _astro
folder.
What I've done in the meantime is: clone astrojs-service-worker
and adjust it with some code that determines the right globIgnores
by letting Astro build the site, and then go through the built pages that I want to cache (ie. index.html
but not /blog/article1
), and scan though them to see which images are loaded on that page, and from that built the globIgnores
list. Being able to provide the globIgnores
dynamically via a callback would allow me to use the original astrojs-service-worker
, and determine the globIgnores
with my custom code collecting the images used on the home page.
Having an AstroJS option to keep the images in the original directory structure would be the most powerful solution I think: that way I could plain and simple define globIgnores: '/blog/**/*'
and all would work without having to jump though hoops. So, it may be a good idea indeed to open an issue at the AstroJS site to discuss this.
Yeah preserving the directory structure does seem attractive. Two downsides are that a) there are a few reasons Astro could be opposed to it, and b) this doesn't work for users until that release is adopted, if the change was accepted.
Another path forward here could be to better integrate workbox into vite, rather than invoking workbox after the astro build process has completed. That could be adoption of something like https://github.com/vite-pwa/vite-plugin-pwa or implementing a solution directly in this package, that could be later extracted.
Alternatively, we could remap any globIgnores using the source into an explicit ignore list targeting the build output, but I have some concerns about maintaining that remapping correctly.
I think that images are the only special case to consider here for globIgnore -- everything else preserves it's directory structure. Does https://github.com/vite-pwa/astro handle your use case correctly?
Does https://github.com/vite-pwa/astro handle your use case correctly?
Unfortunately not. I've just tested out with vite-pwa
, and this plugin has the exact same issue. Looking at the logs it also executes after Astro has generated the output files. To test this, I defined workbox: { globIgnores: ['blog/**/*'] }
and that has no effect (all images are still precached). When defining workbox: { globIgnores: ['_astro/*'] }
there is effect: not-precaching most files (and these files that only exist after build, not in the source code).
I think that images are the only special case to consider here for globIgnore -- everything else preserves it's directory structure.
Not exactly, I think a better description is that html files keep their original directory structure and all else (images, js scripts, css, pdfs, etc) is flattened into the _astro
output folder.
Another path forward here could be to better integrate workbox into vite, rather than invoking workbox after the astro build process has completed.
That would be ideal: collect the files to be precached during the built, at the moment you have both the original name of the resource and the output name. I'm not sure though how complex it would be to implement such an integration. It may be worth an experiment.
Another path forward here could be to better integrate workbox into vite, rather than invoking workbox after the astro build process has completed.
If you’re interested in experimenting with this I’d be happy to collaborate and/or merge an MR for this functionality.
High level we’d want to:
Does that sound right?
I think the only edge case here is the potential file deduplication by Astro since it drops directory paths. In a case where we ignore a file that maps to the same output file as one that is not excluded. In that case we would drop that file from the excluded list.
Yes indeed. I'm not sure how to implement this though (or whether it is feasible in the first place). I guess this logic will need to be an Astro integration, since that is where files get a new output name. I've had a look in the Integration API but I do not see anything that would allow me to collect file input/output names. Maybe addMiddleware
or so? Do you have any experience or pointers or tutorials in that direction?
I think we’ll need to drop down into vite and look for opportunities to get the source and destination file paths.
I'm looking into writing a vite/rollup plugin that collects this information via hooks. I can collect "some" information via the outro
and writeBundle
hooks but not yet what we need. This is what I've tried so far:
// file: astro.config.mjs
export default defineConfig({
// ...
vite: {
plugins: [
collectResourceMapping()
]
},
// ...
})
function collectResourceMapping() {
return {
name: 'collect-resource-mapping',
outro() {
console.log('outro', Array.from(arguments))
},
renderChunk() {
// this logs a *lot* of data
// console.log('renderChunk', Array.from(arguments))
},
writeBundle(_options, bundle) {
console.log('writeBundle', bundle)
}
}
}
I'll ask for some pointers in the Astro community.
Update: I've asked at Discord: https://discord.com/channels/830184174198718474/1197638002764152843/1250458897597202514
Ok the feedback from the community Discord is that this information probably can be found via the writeBundle
callback that I was already testing with. I've done some more testing, but still no luck: I cannot find the relation between the original input file name and path, and the final output file name. One thing that makes this more complicated is that there can be a pipeline with multiple processors, like the astro image service which optimizes files and (again) outputs new file names. It will be complex to hook up all of these integrations in a reliable way.
In short: I don't see an option to safely relate the input/output images via a Vite plugin.
The only alternative that I see is the script that I already developed and working fine for my own use: collect all file names in the _astro
output folder. Then (recursively) scan through each of the html, js, and css files and see whether any of the image names are used there. This can be done reliably thanks to the output files having a unique hash in the name. This way, we can generate a list with all resources used by every individual html file. Applying the globIgnores on all html pages gives us the list with pages to be included/excluded from caching by the service worker. The list with included pages plus the list with resources used by each of these pages gives us what we're looking for: the list with all files to be cached by the service worker.
So I think we can go in two directions here: (1) build this logic inside the astrojs-service-worker
, or (2) make the API of astrojs-service-worker
flexible enough so that we can plug custom logic like this into astrojs-service-worker
.
What do you think @tatethurston?
Hi Tate, thanks for providing
astrojs-service-worker
, it works like a charm!Maybe you can help me with my use case: I have a website that consists of a web application and a blog. I would like to enable a service-worker for the application but not for the blog.
The challenges that I run into are:
_astro
so I can't just useglobIgnores
.Do you have any pointers for me? I guess I have to create something custom for this case.