[enhancement]: add image format with better compression (webp, avif, jxl, etc)

invoke-ai / InvokeAI

InvokeAI is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, supports terminal use through a CLI, and serves as the foundation for multiple commercial products.

https://invoke-ai.github.io/InvokeAI/

Apache License 2.0

22.77k stars 2.35k forks source link

[enhancement]: add image format with better compression (webp, avif, jxl, etc) #2704

Open keturn opened 1 year ago

keturn commented 1 year ago

We generate a lot of images. I would like them to be encoded more efficiently. I don't need the encoding to be lossless.

I often convert the images I've generated with InvokeAI to webp before sharing them, and it's common for them to come out at 10% of the size of the PNG, or less. That's not 10% smaller, that's 10% total, e.g. a 0.77 MB file to a 0.05 MB file.

Feature-wise, JPEG XL would be my first choice for an output format, as it has fewest constraints on size and plenty of options for bit depth and extra channels.

However, given that InvokeAI is a web-centric application, JPEG XL doesn't have the browser support to be viable anytime soon. webp or avif would be more practical in this regard.

Additional Context

Easy part: image data

Write webp from PIL or avif from pillow-avif-plugin.

Messy part: metadata

InvokeAI currently stores image metadata inside a PNG tEXt chunk. It is not immediately obvious how to apply that to other file formats.

IMHO we should switch to using XMP instead, as XMP works more or less the same way regardless of image format.

psychedelicious commented 1 year ago

good first issue :D

With nodes, our metadata handling will need to be refactored substantially. Insufficiently forward-thinking decisions on metadata handling could force us to retain multiple layers of backwards compatibility.

There is some good discussion back in #266 which adds context to the current metadata handling.

We already have two different metadata formats right now - the web UI necessarily diverged from the spec in that issue, while the CLI stuck to it. This topic really needs some brainstorming and careful consideration.

Maybe we need two metadata formats - one for the "standard" SD parameters and one for our application-specific stuff. This is even trickier because every two weeks we have some new hotness that needs to be recorded somehow. The nodes server lets us utilize an sql database, so we could more easily retain all image data while only exporting compliant metadata on the images themselves.

Another consideration is compatibility with other tools used in the space.

psychedelicious commented 1 year ago

Canvas requires lossless images to function.

github-actions[bot] commented 1 year ago

There has been no activity in this issue for 14 days. If this issue is still being experienced, please reply with an updated confirmation that the issue is still being experienced with the latest release.

keturn commented 1 year ago

wow stalebot two weeks is a pretty short horizon

yes this is still a relevant enhancement request, though there might be some stuff relating to how nodes send images to storage to do first.

psychedelicious commented 1 year ago

@keturn the canvas relies on lossless images. Do you have any ideas for working around that and also reducing image size?

keturn commented 1 year ago

Canvas could use some lossless=True flag, which could either stick with PNG or use lossless-mode of one of the more modern (JXL, AVIF, WebP) formats.

I assume this requirement comes from the fact that the canvas workflow involves a lot of encode-decode-encode iterations, and lossy encodings tend to get progressively more lossy with each iteration of that kind of thing?

psychedelicious commented 1 year ago

Yes, but also - the canvas consists of images layered on top of one another. Compression can lead to artifacts on the edges of images and visible seams.

I'm generally skeptical of lossy formats in general for professional/enthusiast workflows. Maybe if you just want to make some fun images you'd prefer to save bandwidth, disk, memory etc. But most "serious" workflows will require lossless.

Making lossless a toggle is also risky. Imagine you make it a fair way into a processing chain using lossy compression and then decide you want full quality for the final output. If the lossless isn't retained somewhere, you'd need to redo the whole chain.

(chain = an arbitrary number of arbitrary processing methods, many of which do not exist at this time)

Void2258 commented 1 year ago

It would be nice to be able to start with WebP images too. A lot of things you can find on the internet to use as inputs for Img2img are WebP, so this would save lots of conversion. It doesn't need to stay in WebP (ie background convert to PNG is fine), just have it be able to take it in for img2img or canvas without conversion from another program needed.

github-actions[bot] commented 1 year ago

JamesClarke7283 commented 1 month ago

WebP images are so much better, more modern features and way more optimised. Support would be much appriciated.