Configure a service-worker to ignore the blog section of a website

tatethurston / astrojs-service-worker

An Astro integration to generate a Service Worker. Powered by Workbox.

MIT License

71 stars 7 forks source link

Configure a service-worker to ignore the blog section of a website #26

Open josdejong opened 7 months ago

josdejong commented 7 months ago

Hi Tate, thanks for providing astrojs-service-worker, it works like a charm!

Maybe you can help me with my use case: I have a website that consists of a web application and a blog. I would like to enable a service-worker for the application but not for the blog.

The challenges that I run into are:

How to make sure the service-worker only caches resources needed for the web application, and not the blog? AstroJS outputs all resources in a flat directory _astro so I can't just use globIgnores.
How to configure a service-worker such that it only runs for the web application, and not on blog pages?

Do you have any pointers for me? I guess I have to create something custom for this case.

tatethurston commented 7 months ago

Hey @josdejong what's your URL path structure look like? Are your blog or application pages under a separate path? eg /blog/some-page or /app/some-page?

This library is a pretty thin wrapper over workbox so we can look over there for some inspiration too.

josdejong commented 7 months ago

Thanks for your reply. The blog is indeed under a separate path. It boils down to:

/                       (main application)
/blog/article1
/blog/article2
/blog/article3

The blog articles have images, but all images end up flattened in a folder ./dist/_astro/ after building with AstroJS.

I figured out that I can use registration.autoRegister = false and manually add code only to the web application at /, that's great, so that solves my challenge (2).

For challenge (1) I would need to specify workbox.globIgnores after Astro has built the website, and before astrojs-service-worker creates the service worker. For the time being I've copied and adjusted the code of astrojs-service-worker so that it get's the list with all resources in _astro, and then checks index.html to see which resources are used with a simple String.includes(resourceName), recursing into js/css files used by index.html.

Would you be interested in either:

Extending the API for workbox to allow for a callback (little work):

export interface ServiceWorkerConfig {
  // ...
  workbox?: InjectManifestOptions | GenerateSWOptions | (dir: URL) => InjectManifestOptions | GenerateSWOptions;
}

Build a full fledged solution for a globIgnores that works with AstroJS? (more work)

I can work out a PR for either if you want.

tatethurston commented 6 months ago

How to configure a service-worker such that it only runs for the web application, and not on blog pages?

The Service Worker API supports the opposite use case, restricting the service worker to a particular path with scope, but workbox has a configuration option navigateFallbackDenylist that I believe solves this for you without you needing to take over manual registration -- LMK if that works.

The blog articles have images, but all images end up flattened in a folder ./dist/_astro/ after building with AstroJS.

Other than images, does globIgnores work for your use case? Because you can locate all source code for the blog under a blog directory and ignore that directory?

It's unfortunate we can't simply create a blog directory in images because Astro's build process loses the directory nesting. I'm inclined to see if Astro would be open to changing that. Another approach would be "tagging" images for exclusion, eg ${image_name}.blog.png and then using the blog "tag" to target assets for exclusion using globIgnores.

josdejong commented 6 months ago

Hm, I haven't used offlineFallback options before, but as far as I understand this is meant as a fallback page for non-cached pages in case there is no internet connection, showing something like "Could not connect - Retry". I don't think it applies to this case of caching a part of the website only.

Indeed globIgnores itself works like a charm (and is meant for this). The issue is that we cannot use a pattern like /blog/**/* since AstroJS flattens resources like images into the _astro folder.

What I've done in the meantime is: clone astrojs-service-worker and adjust it with some code that determines the right globIgnores by letting Astro build the site, and then go through the built pages that I want to cache (ie. index.html but not /blog/article1), and scan though them to see which images are loaded on that page, and from that built the globIgnores list. Being able to provide the globIgnores dynamically via a callback would allow me to use the original astrojs-service-worker, and determine the globIgnores with my custom code collecting the images used on the home page.

Having an AstroJS option to keep the images in the original directory structure would be the most powerful solution I think: that way I could plain and simple define globIgnores: '/blog/**/*' and all would work without having to jump though hoops. So, it may be a good idea indeed to open an issue at the AstroJS site to discuss this.

tatethurston commented 6 months ago

Yeah preserving the directory structure does seem attractive. Two downsides are that a) there are a few reasons Astro could be opposed to it, and b) this doesn't work for users until that release is adopted, if the change was accepted.

Another path forward here could be to better integrate workbox into vite, rather than invoking workbox after the astro build process has completed. That could be adoption of something like https://github.com/vite-pwa/vite-plugin-pwa or implementing a solution directly in this package, that could be later extracted.

Alternatively, we could remap any globIgnores using the source into an explicit ignore list targeting the build output, but I have some concerns about maintaining that remapping correctly.

I think that images are the only special case to consider here for globIgnore -- everything else preserves it's directory structure. Does https://github.com/vite-pwa/astro handle your use case correctly?

josdejong commented 5 months ago

Does https://github.com/vite-pwa/astro handle your use case correctly?

Unfortunately not. I've just tested out with vite-pwa, and this plugin has the exact same issue. Looking at the logs it also executes after Astro has generated the output files. To test this, I defined workbox: { globIgnores: ['blog/**/*'] } and that has no effect (all images are still precached). When defining workbox: { globIgnores: ['_astro/*'] } there is effect: not-precaching most files (and these files that only exist after build, not in the source code).

I think that images are the only special case to consider here for globIgnore -- everything else preserves it's directory structure.

Not exactly, I think a better description is that html files keep their original directory structure and all else (images, js scripts, css, pdfs, etc) is flattened into the _astro output folder.

Another path forward here could be to better integrate workbox into vite, rather than invoking workbox after the astro build process has completed.

That would be ideal: collect the files to be precached during the built, at the moment you have both the original name of the resource and the output name. I'm not sure though how complex it would be to implement such an integration. It may be worth an experiment.

tatethurston commented 5 months ago

Another path forward here could be to better integrate workbox into vite, rather than invoking workbox after the astro build process has completed.

If you’re interested in experimenting with this I’d be happy to collaborate and/or merge an MR for this functionality.

High level we’d want to:

hook into the build process to build a map of input file to output files
Apply the provided globignores (or a similiar new field specific to this library) to the input files, to build a list of files to exclude
Pass the excluded file list to workbox

Does that sound right?

I think the only edge case here is the potential file deduplication by Astro since it drops directory paths. In a case where we ignore a file that maps to the same output file as one that is not excluded. In that case we would drop that file from the excluded list.

josdejong commented 5 months ago

Yes indeed. I'm not sure how to implement this though (or whether it is feasible in the first place). I guess this logic will need to be an Astro integration, since that is where files get a new output name. I've had a look in the Integration API but I do not see anything that would allow me to collect file input/output names. Maybe addMiddleware or so? Do you have any experience or pointers or tutorials in that direction?

tatethurston commented 5 months ago

I think we’ll need to drop down into vite and look for opportunities to get the source and destination file paths.

https://v2.vitejs.dev/guide/api-plugin.html

josdejong commented 5 months ago

I'm looking into writing a vite/rollup plugin that collects this information via hooks. I can collect "some" information via the outro and writeBundle hooks but not yet what we need. This is what I've tried so far:

// file: astro.config.mjs

export default defineConfig({
  // ...
  vite: {
    plugins: [
      collectResourceMapping()
    ]
  },
  // ...
})

function collectResourceMapping() {
  return {
    name: 'collect-resource-mapping',

    outro() {
      console.log('outro', Array.from(arguments))
    },

    renderChunk() {
      // this logs a *lot* of data
      // console.log('renderChunk', Array.from(arguments))
    },

    writeBundle(_options, bundle) {
      console.log('writeBundle', bundle)
    }
  }
}

I'll ask for some pointers in the Astro community.

Update: I've asked at Discord: https://discord.com/channels/830184174198718474/1197638002764152843/1250458897597202514

josdejong commented 5 months ago

Ok the feedback from the community Discord is that this information probably can be found via the writeBundle callback that I was already testing with. I've done some more testing, but still no luck: I cannot find the relation between the original input file name and path, and the final output file name. One thing that makes this more complicated is that there can be a pipeline with multiple processors, like the astro image service which optimizes files and (again) outputs new file names. It will be complex to hook up all of these integrations in a reliable way.

In short: I don't see an option to safely relate the input/output images via a Vite plugin.

The only alternative that I see is the script that I already developed and working fine for my own use: collect all file names in the _astro output folder. Then (recursively) scan through each of the html, js, and css files and see whether any of the image names are used there. This can be done reliably thanks to the output files having a unique hash in the name. This way, we can generate a list with all resources used by every individual html file. Applying the globIgnores on all html pages gives us the list with pages to be included/excluded from caching by the service worker. The list with included pages plus the list with resources used by each of these pages gives us what we're looking for: the list with all files to be cached by the service worker.

So I think we can go in two directions here: (1) build this logic inside the astrojs-service-worker, or (2) make the API of astrojs-service-worker flexible enough so that we can plug custom logic like this into astrojs-service-worker.

What do you think @tatethurston?