withastro / astro

The web framework for content-driven websites. ⭐️ Star to support our work!
https://astro.build
Other
46.31k stars 2.45k forks source link

@astrojs/image: Don't include original image in output bundle #4961

Closed jkjustjoshing closed 2 years ago

jkjustjoshing commented 2 years ago

What version of astro are you using?

1.2.5

Are you using an SSR adapter? If so, which one?

No

What package manager are you using?

npm

What operating system are you using?

Mac

Describe the Bug

I'm not sure if this is a bug or a feature request - my apologies if I should have submitted this to github.com/withastro/rfcs

When using @astrojs/image with the <Picture> component and widths={} prop, multiple different sizes of images end up in my ./dist/assets folder corresponding to the values I pass widths. However, there is also a copy of the original full-sized image.

This full-sized image is not referenced in the HTML at all, and is never downloaded by the site user, so I have no concerns for user-experience with an oversized image. However, including this full-size image in the bundle makes deploys slower and can increase hosting costs.

Suggested change

Remove full-size image from output directory

Alternatative

Setting in integration config options for removing the full-size image.

Link to Minimal Reproducible Example

Not applicable (but I can provide one if anyone disagrees)

Participation

delucis commented 2 years ago

I think this was added at the request of people who did need the full size image as well (to reference as an open graph image or use as a download link or something). So I guess a config option that lets you opt out might make sense (or opt in, not sure what makes most sense)

jkjustjoshing commented 2 years ago

That's what I figured. However, the original image has a hashed filename - how would someone write code to actually expose the URL of the full-sized image to the user?

If it is possible to get the path to the full-sized image, then I agree that it's good to keep in the output folder (with an option to exclude it). However, if it's not possible then I would say either a) there's no value in having it or b) there should be a way to expose the original file's path to the developer.

panwauu commented 2 years ago

Possible duplicate of #4896

tony-sull commented 2 years ago

The original image actually does need to be included here in case an image is imported and the src is used directly.

Astro won't actually know what image variants are used until every page is built, and it won't actually know at all in SSR since the pages are built at run time.

It's a bit of an implementation detail and a little annoying to see extra files included here, but there's not a way to safely avoid it either unfortunately

---
import hero from '../hero.jpg'
---

<!-- this src can be used as-is, but it will fail if the original file isn't included in dist -->
<img src={hero.src} />
jkjustjoshing commented 2 years ago

Ah that makes sense. I wasn't thinking about using the image in a raw/native <img> element without using <Image> or <Picture>. Thanks for the context!

supermoos commented 2 years ago

@tony-sull could a workaround be to have some sort of import parameter that would exclude it from the output? something like:

---
import hero from '../hero.jpg!discard'
---

It's a big issue as it defeats the purpose for a lot of image generation scenarios to output the original source also.

supermoos commented 2 years ago

Or perhaps some sort of post build way to delete those files again?

jkjustjoshing commented 2 years ago

@supermoos the naming convention of the files is regular, so you probably could write a script post-build that looks for the original file. It would be pretty fragile, but could do something like the following:

Look for files matching the regex /^([^.]+)\.([0-9a-f]+)\.([a-z]+)$/ - that is, "xxxx.abc123.jpg". This is an original (all non-original files have an underscore in the segment before the file extension). $1 is the first part of the path, $2 is the unique content hash, and $3 is the file extension. Before deleting it would be smart to make sure there exist optimized versions of this file - files that follow the pattern $1.$2_[anything alphanumeric].[any extension].

image
supermoos commented 2 years ago

Thanks for this. Seems a bit error prone though. What if I actually wanna keep some of the source files for some reason? The import suggestion would solve that issue.

simonwiles commented 1 year ago

Fwiw, this causes me a problem on a very image-heavy site, as my originals are large and when they're copied into the output folder I end up with something that's too big to upload to my hosting environment.

rikur commented 2 months ago

I get a lot of noise on aHrefs site audit, because the crawler will download the original images that are way too big.

robinwatts96 commented 2 months ago

Hi, this is a really useful thread. I'm having a somewhat unrelated issue, but this thread seems like the best place I could find help.

I want to get the final path to the images in the _astro folder dynamically in my head element. I need to do this before the build process, not on the client. It would be easy enough if I could do this on the client...

I want to set the og:image property in the head, but the value I set won't function as intended if set it on the client with JS, as Google scrapes your page pre any JS you add on the client.

So if there was a way for me to guess the abc123 part of the file path: "xxxx.abc123.jpg" then I think i'd be able to work the rest out.

Any help with this would be greatly appreciated!

@supermoos the naming convention of the files is regular, so you probably could write a script post-build that looks for the original file. It would be pretty fragile, but could do something like the following:

Look for files matching the regex /^([^.]+)\.([0-9a-f]+)\.([a-z]+)$/ - that is, "xxxx.abc123.jpg". This is an original (all non-original files have an underscore in the segment before the file extension). $1 is the first part of the path, $2 is the unique content hash, and $3 is the file extension. Before deleting it would be smart to make sure there exist optimized versions of this file - files that follow the pattern $1.$2_[anything alphanumeric].[any extension].

image
delucis commented 2 months ago

@robinwatts96 you can get the final output path by importing the image:

---
import cover from '../cover.jpg';
---

<meta name="og:image" content={new URL(cover.src, Astro.site)}>

If you need to do this more dynamically, this docs page may help: https://docs.astro.build/en/recipes/dynamically-importing-images/

Otherwise please do jump into our Discord chat where people are always happy to provide support: https://astro.build/chat

robinwatts96 commented 2 months ago

@delucis This is exactly what I needed - thank you so much, I appreciate your help.

wtchnm commented 1 month ago

@supermoos the naming convention of the files is regular, so you probably could write a script post-build that looks for the original file. It would be pretty fragile, but could do something like the following:

Look for files matching the regex /^([^.]+)\.([0-9a-f]+)\.([a-z]+)$/ - that is, "xxxx.abc123.jpg". This is an original (all non-original files have an underscore in the segment before the file extension). $1 is the first part of the path, $2 is the unique content hash, and $3 is the file extension. Before deleting it would be smart to make sure there exist optimized versions of this file - files that follow the pattern $1.$2_[anything alphanumeric].[any extension].

image

For those who don't use imported images directly in plain <img /> tags, here's an Astro integration that removes the original images from the build:

// In Astro 4, unused images are removed from the build, but some original images may still remain in the _astro folder.
const removeOriginalImages: AstroIntegration = {
    name: 'remove-original-images',
    hooks: {
        'astro:build:done': async ({ dir }) => {
            const path = dir.pathname + '_astro/'
            const files = await fs.readdir(path)
            for (const file of files) {
                const parts = file.split('.')
                const ext = parts.pop()
                if (ext && ['jpg', 'png'].includes(ext)) {
                    const hash = parts.pop()
                    if (hash && !hash.includes('_')) {
                        await fs.unlink(path + file)
                    }
                }
            }
        }
    }
}
tenpaMk2 commented 1 month ago

Thank you for sample codes. I made improved version.

import fs from "node:fs/promises";
import path from "node:path";

...

    {
      name: "remove-original-images",
      hooks: {
        "astro:build:done": async ({ dir }) => {
          const astroDir = path.join(dir.pathname, `_astro/`);
          const files = await fs.readdir(astroDir);

          for (const file of files) {
            const { name, ext } = path.parse(file);
            const { ext: hashStr } = path.parse(name);

            if (!ext) continue;
            if (!hashStr) continue;
            if (![`.jpg`, `.jpeg`, `.png`, `.webp`].includes(ext)) continue;
            if (hashStr.includes(`_`)) continue;

            console.log(`Removing original image: ${file}`);
            await fs.unlink(path.join(astroDir, file));
          }
        },
      },
    },

Optimized image seems to have {name}.{original hash}_{additinal hash}.{ext} . However, sadly, some {original hash} contain _ . Therefore, the above scripts wrongly detect these images as optimized and will not remove it.

jurajkapsz commented 4 days ago

Thanks guys for sharing the original image removal code. Probably you got it further improved by now, in any case here is my version as of now. I modified the matching algorithm to overcome the mentioned _ limitation and added support for reflecting the Astro _assets folder setting from AstroConfig itself because I myself have it customized.

I also added some logging so I know about interventions on the official build process. I went for a shorter integration name ROI just because of that; not sure about it, but the logs are more compact that way. Can be easily changed. Otherwise I tried to have my code human readable.

Finally, I narrowed the checked original image file formats for my use case as I don't expect to deliver web image formats (webp, avif, ...?) myself, but to have them generated for me. This can be changed in the formats array, as anything else.

// https://docs.astro.build/en/guides/imports/#node-builtins
import { readdir, unlink } from "node:fs/promises";
import path from "node:path";
import type { AstroIntegration, AstroConfig } from "astro";

// Inspired by https://github.com/withastro/astro/issues/4961#issuecomment-2322936873
function removeOriginalImages() {
  let astroConfig: AstroConfig;

  // Used to identify original image files
  const ORIGINAL_IMAGE_FORMATS = ["jpg", "jpeg", "png"] as const;
  // Regex pattern: `dot eight hash chars dot`
  const ORIGINAL_IMAGE_HASH_PATTERN = "\\.[a-zA-Z0-9_\\-]{8}\\.";

  const integration: AstroIntegration = {
    name: "ROI",
    hooks: {
      "astro:config:done": ({ config }) => {
        astroConfig = config;
      },
      "astro:build:done": async ({ dir, logger }) => {
        const astroAssetsDir = path.join(dir.pathname, astroConfig.build.assets);
        const files = await readdir(astroAssetsDir);

        let foundFilesCount = 0;
        let removedFilesCount = 0;

        for (const file of files) {
          const { ext } = path.parse(file);
          // Strip `ext` of dot also for use in upcoming regex match, as dots have special meaning
          const fileFormat = ext.slice(1);

          if (!(ORIGINAL_IMAGE_FORMATS as ReadonlyArray<string>).includes(fileFormat)) continue;

          // Match original image files by ending with single hash and extension
          const reOriginalImage = new RegExp(`${ORIGINAL_IMAGE_HASH_PATTERN}${fileFormat}$`);
          if (!reOriginalImage.test(file)) continue;

          foundFilesCount++;
          logger.warn(`Removing ${file}`);

          const result = await unlink(path.join(astroAssetsDir, file));

          if (isUnlinkSuccessful(result)) {
            removedFilesCount++;
          } else {
            logger.error(`Couldn't remove file ${file}`);
          }
        }

        if (foundFilesCount > 0) {
          logger.warn(`Removed ${removedFilesCount}/${foundFilesCount} files.`);
        }

        /** Helper function */
        function isUnlinkSuccessful(result: unknown): boolean {
          // https://nodejs.org/docs/latest-v20.x/api/fs.html#fspromisesunlinkpath
          return typeof result === "undefined";
        }
      },
    },
  };

  return integration;
}

export default removeOriginalImages;