phoboslab / qoi

The “Quite OK Image Format” for fast, lossless image compression
MIT License
6.92k stars 330 forks source link

Premultiplied Alpha #181

Closed mystise closed 2 years ago

mystise commented 2 years ago

I realize this might not be in scope, or a bit too late because the final specification exists, but I thought I'd ask anyway, just in case.

Would it be possible to dedicate a bit (perhaps in the colorspace byte in the header) to flag "this data has already been premultiplied by alpha"?

The reasons this is important are shown below in the linked blog posts:

https://www.realtimerendering.com/blog/gpus-prefer-premultiplication/

https://www.realtimerendering.com/blog/png-srgb-cutoutdecal-aa-problematic/

https://www.realtimerendering.com/blog/a-png-puzzle/

But as a summary, premultiplication fixes a lot of issues all at once involving images with alpha channels, but unfortunately actually doing the premultiplication requires a few annoying steps for an sRGB image:

If the image is already in linear space, you can skip the sRGB transitions.

And only after all this can it be uploaded to the graphics stack (GL, Vulkan, whatever else you might use). All of these things can be done during a preprocessing phase, but there's currently no way to flag a file as having already gone through this.

The downsides of adding this are that older loaders that properly handle premultiplication but don't know about the flag would premultiply again, thus making the image much darker than intended. (Note that if they use the standard decode functions in qoi.h, they will actually return NULL because the functions check if the colorspace field is greater than 1. In this case, the downside is that newer images would fail to decode on older loaders rather than producing an invalid image, which is probably better overall.)

The downsides of not adding this are that this processing has to be done unconditionally, every time an image is loaded from disk.

Note that nothing else in the format has to change, reserving a single bit in the colorspace header byte will be enough, premultiplied values can be stored in exactly the same manner as regular values can, this only affects how much processing an image needs to go through before getting to the GPU.

Brian151 commented 2 years ago

not sure what the author thinks, but sounds both out of scope and frankly, unnecesarry

scope: if the stated goal is to be small & simple, this is an added layer of complexity, and the downside that a single encoder/decoder actually could basically, corrupt the images, is quite a major one. existing formats have enough issues with "feature not supported" or one little misshap corrupts data. in fact, PNG is quite suceptible to data corruption, rare as it may be, image = gone if it does.

unnecesarry: if writing a renderer, there's no excuse to not go through the steps to make sure your data passes-through in the exact format expected. if that means pre-multiplied alpha, then that's just part of your game/application's overhead. as simple as this format is, you've likely messed-up somewhere if you can't do the pre-multiplication on-demand. also, isn't this a use case for compute shaders?

conclusion: valid point, although it [again] makes me scratch my head at decsions made with modern graphics hardware. but doesn't seem to make any sense in the file format itself. this type of processing really needs to happen as needed by the given software. i only read the first link, and barely, only slightly understood it, but i've also been up significantly later than i intended to be, so not the best time to read in-depth tech blogs.

but, that's just my thoughts on this...

mystise commented 2 years ago

Scope: Added complexity is very small, doing a premultiplication is only a few steps, and the most annoying part is sRGB <-> Linear conversion, which isn't all that complicated, 3 if statements and a few multiplications.

As for "corrupting images", that's really not what's going on here?

A decoder that is not premultiplication aware is going to keep working as it always has, and in fact will work better under this new system as now it gets proper alpha blending for free.

A decoder that is premultiplication aware can check one bit and decide to skip preprocessing, thus saving a small amount of decode time.

The primary thing to note is that the only time data should not be premultiplied is when editing an image, every time it's displayed it should be using premultiplied alpha. As such, with QOI being a simple image format, having a path that says "hey the decoder has to do less work to be able to display images correctly" is a good thing in my opinion.

Unnecessary: It is precisely because premultiplication is easy that I think it makes a good fit, it's a very small modification to the encoder, and the decoder doesn't have to care at all (because the raw image data is output alongside the flag that tells the application if it needs to do premultiplication. If it ignores that flag being false, then it won't do anything horrible, it'll just display incorrectly.)

But also, the second link from above ( https://www.realtimerendering.com/blog/png-srgb-cutoutdecal-aa-problematic/ ) shows what happens when a decoder doesn't premultiply (and the examples given are "all browsers", which is still true today), and it's wrong but not world-shatteringly so. Having the format able to tell the decoder "hey, this is already premultiplied" is overall a positive.

Conclusion: Unfortunately this isn't "a decision made by modern graphics hardware", it's the nature of the color space that everything has normalized to. sRGB is used by image formats, image editors, practically everything on the planet, and it's perfect for displaying things, but terrible at blending or manipulating. The primary problem is that doing it wrong still gets things close enough that it's difficult to understand why things are wrong if you didn't already know about sRGB.

I really don't think this is a "must have", merely something that png got wrong and that it would be simple and nice to correct.

EDIT: Also I just realized this only makes sense when an image already has an alpha channel. An sRGB image without an alpha channel is already premultiplied, as the alpha is all 1.0. Perhaps it would be better as a state in the channels byte instead? Like channel = 5 is RGBA premultiplied? Hmm. That removes the literal usage of channels though, as the current values are 3 or 4 for exactly that number of channels. Oh well, this all depends on if this is even something QOI wants to handle in the first place.

uis246 commented 2 years ago

Convert each channel from u8 to f32 (So as to not lose precision during the later steps)

Not shure about this step. It is possible to use only integer numbers at least in RGB space. But this is not scope of any image codec.

not sure what the author thinks, but sounds both out of scope and frankly, unnecesarry

Only unnecesarry thing for codec itself is the description of how to make premultiplication on sRGB.

this is an added layer of complexity

Premultiplication does not changes the way you (de)code image, it only changes how to interpret result. Basically it is modified color space.

mystise commented 2 years ago

Convert each channel from u8 to f32 (So as to not lose precision during the later steps)

Not shure about this step. It is possible to use only integer numbers at least in RGB space. But this is not scope of any image codec.

A u8 sRGB to u8 Linear RGB conversion is very lossy. If you do a u8 sRGB to u16 Linear RGB conversion then that has enough precision to not lose anything, but at that point you're already doing a conversion and converting to f32 makes the later steps easier.

And yes, the codec doesn't have to care at all about how premultiplication is implemented, this is just a flag for "has it already been done or does it need preprocessing". If this is accepted, the reference encoder might decide to include a premultiplier, and thus become a bit more complicated, but the codec would remain the same complexity.

mystise commented 2 years ago

Closing per https://github.com/phoboslab/qoi/commit/339e11e2fd474b2f359b7b378d08b8a5ed444b6a

amstan commented 1 year ago

For future onlookers, here's a recent Captain Disillusion video about exactly this: https://www.youtube.com/watch?v=XobSAXZaKJ8

At first I tought this bug was being pendantic, but now I understand the power of premultiplication, especially when it comes to emissive stuff: https://youtu.be/XobSAXZaKJ8?t=273

Brian151 commented 1 year ago

@amstan ah, CD haven't watched him in a while! might watch that later! the article honestly only really confused me more.

honestly, different techniques have different uses. tech world is very stuck in place trying to always use the same for literally everything. still, QOI ultimately is about making the format easy/basic, and modern formats/APIs/frameworks largely owe their complexity to feature creep. [and even some over-engineering] my game engine even treads a dangerous path there. first and foremost, i want it to be lightweight and easy to use. but, there are specific features that i insist/wish to have.

as a company, i can't say many nice things about nintendo, i don't like how they treat their fans/customers. but, from an engineering/design standpoint, i can appreciate some of the decisions made. and something i noticed is their graphics formats. they went from just shoving a bunch of data somewhere in ROM to taking the same data and adding the headers required to make it easier to work with and possibly port it to other formats/tools. modern formats just don't do this. again, QOI is trying to be fast and reasonably compact. every new feature adds some overhead, makes the code larger and potentially slower.

and...me and rambling... sigh...

either way, the spec is locked in place, so a derivative format would need to account for this

uis246 commented 1 year ago

Premultiplied alpha is interpretation of data rather than transformation

rdrpenguin04 commented 1 year ago

Another thing that the CD video didn't bring up is that rendering can only really happen into a pre-multiplied format. It's not just that 3D rendering software prefers it, it's that generated images using standard alpha blending are pre-multiplied by default, and converting out of a pre-multiplied format is necessarily lossy (whereas converting from straight alpha to pre-multiplied is completely legal and not lossy in the slightest).

(I mention this because I was working on a rendering engine that pre-rendered semi-transparent graphics, and after about an hour of doing the math, I figured out that my outputs had to be treated pre-multiplied. Having this as an option if not the default on image formats would be the easier and more general case)

n00bmind commented 1 year ago

I totally second this suggestion, and I think it's a bit of a pity the author decided to completely close the spec to any kind of evolution so early on. Accounting for pre-multiplied alpha reduces complexity instead of increasing it imo. The fact that formats don't account for it adds a ton of complexity to pipelines for basically no reason.

Brian151 commented 1 year ago

if you read the blog post, there's a reason the author nailed the spec down so early. the existing formats added layers upon layers of complexity which made them not only difficult for humans to grasp, but increased the work computers had to do when processing said formats. and then there were features not added because they needed to be there, but because certain types of people put in "their two cents" , like copyright metadata in the MPEG formats. programming languages have had the same issue. they evolved too much and in the wrong ways. C++ for example [and just for fun : remember that C++ itself is not 100% nailed-down, there's just a general concensus what specific version/sub-set people should use] there are even features that despite their presence, nobody uses and tools either don't implement or explicily require the user to enable. color-indexed PNG for example. most PNGs out there, especially used for websites and game/app interfaces/logos absolutely don't need to be 24 or 32 bit. they don't use that many colors, nowhere close...

so, pre-multiplication decreases complexity specifically with regards to rendering, sure. but what about authoring? and if it''s lossy to convert back from pre-multiplied, it makes no sense when editing the file to use it. it only makes sense in the final file. and here's where you actually add a whole layer of complication : having multiple versions of the same file, or even some project document. or ofc, you can play the dangerous game of incurring generational losses from re-saving the same data however many times [this is one of the big turn-offs of jpeg and other lossy formats. inevitably, this will manifest itself, and it's ever present in some of our oldest memes which have been re-jpeg'd countless times] also, now that i think about it, how does pre-multiplication handle additional layers of transparency? say your image/texture is already 50% transparent but then the layer or whatever on which its displayed/exists is ALSO 50% transparent? this also happens in editing or even rendering.

it seems to me the correct solution to this problem is have a specific derived format or sub-format specifically for purposes of rendering. and that said, the pieces are right here to make a these. the spec is closed, but that's not stopping people from making video/animation wrappers around it, for example. so why not just encode the pre-multiplied image as QOI and wrap another format over it rather than complain so much "the spec doesn't account for this"? i have similar griavances towards those whom complain "OBJ doesn't do this!" rather than sit down, and IMPLEMENT THE FEATURE. and i'm mainly talking about programmers writing tools to decode proprietary-game-format-here, they have NO excuse. and because of their decisions, we get stuck with many models being exported in autodesk formats or sacrificing important features to OBJ.

n00bmind commented 1 year ago

Woa woa I think you're just assuming a lot of stuff here ..

First of all I'm not questioning the author's decision, I totally understand it and respect it, and I think this format is a great idea, that's why I'm implementing an encoder / decoder for it myself.. I just think time can give ideas some maturity that just cannot be achieved in any other way. Just look at how much the spec evolved in the first few weeks due to the fact that it was left open to community suggestions for a while and incorporated a lot of ideas that helped it achieve its initial goals better. I believe leaving room for further changes could easily further this same process even more. And I don't know why on earth I even need to say this, but leaving room for some changes in the future is not even remotely the same as accepting any changes proposed by random people on the internet.

Second, you seem to think what's being proposed here is making the format somehow handle all kinds of scenarios around pre-multiplied images and whatnot. However you will note that I never said this format should be in charge of handling premultiplied alpha in any way.. and for that matter, neither did the OP: Note that nothing else in the format has to change, reserving a single bit in the colorspace header byte will be enough, premultiplied values can be stored in exactly the same manner as regular values can

What I was simply suggesting, together with the OP, is that a mere extra bit in the colorspace field be dedicated to informing users whether the image contained has already undergone pre-multiplication or not! In the same way that it currently informs whether an image's colorspace is linear or non-linear, which as stated in the spec doesn't modify in the slightest how the data is encoded.. i.e. a tiny new piece of metadata. When I said accounting for pre-multiplication reduces complexity, what I meant is just knowing what type of data you're dealing with would help solve a lot of problems because you have a way of knowing what pre/post processing to apply when. Afaik, no general purpose formats have a way of storing this kind of information (there seem to be special-purpose formats that do, but I'm no graphics programmer so don't quote me on that..). And yeah, ok, I get it, this is not a "general purpose format", but seriously, the only way this will gather enough adoption is if it solves real problems people have, and the issues around premultiplied alpha are real issues, specially in gamedev, which seems to be something this format is kind of targeted at..

Anyway, this is all I'll say on the subject, have a great day.

PS: The Captain Disillusion video above is really cool and it's actually arguing for the same thing in the last part.. see https://youtu.be/XobSAXZaKJ8?si=KTfQPTa-sTV8Em9B&t=344

Brian151 commented 1 year ago

true, but it does tend to be a slippery slope. color space is actually a field that in general, i don't believe we have exploited to its fullest. i can list off some far less reasonable ideas that all fit niche applications but cause a mightmare when accounted for in a generalized format. but, i'd just roll my own to specifically accomodate that, assuming i ever pushed it beyond a thought experiement. some...might not be so reasonable. also, a single bit in this context might seem modest, but remember that's one less bit, and every time you reseve some bits [or any range] , you actually reduce the available options for later. that's the not so-fun part about future-proofing binary formats. one of the things i like about swf, actually... someone thought ahead when reserving 10 bits to the tag ID. there's also the version number thing. but that has historically gone wrong to creating evolutions that are so different they may as well be a different format.

somewhere along the lines, someone/something has to accomodate this. otherwise you have an image which doesn't look correct when treated as the wrong type of pixel data. if you just flip some bit, you're practically begging for someone to mess this up, even intentionally. fact is, even reserving a whole ID for it, this is bound to happen. i get the idea, i just don't see how it's not going to cause an issue somewhere down the line. especially when you just take for granted "user said it is [not] premultilied". for this to work exactly as oped, there are multiple moving pieces, and the most important one is that the pixel data written/read is in fact the same data that "isPreMultiplied" bit says it is.

what you should be doing is pressuring the programmers of the graphics tools. formats can and have done a lot, but simply put, implementors have been messing things up, consistently. [the W3C and browser vendors guilty of some of the most notewothy examples in recent years]. if image decoders/encoders are going to add "this data is pre-multiplied" to a given file, then there needs to be some actual enforcement to that effect. perhaps a specialty format should be reserved for this purpose, honestly. it's the most bullet-proof example. again, why not add a wrapper or a footer over the existing QOI format? or even, any image format. wrapper, especially. this is something explcitely in your face when attempting to access the file. if you assume it isn't there, you get a crash/error. if you skip it, you KNOW you're doing something wrong.

my reaction is because, despite what already has been done/said, there's just been "well, it should've been done this way" , "it could've been done this way" , etc... constantly. again, people are building animation/video over this format. the format specs are open. build/test/demonstrate accomplishes more than come here to complain about what doesn't done or preach what some guy on youtube said. if you respect the decision of the author, then don't make a very active and public point on his own repo [or worse:social...] of questioning it. i get all your points, i agree more or less with the video. but i don't feel that this was remotely the appropriate way to address the situation. there aren't a dozen patents and an EULA saying you can't take the entirety of this format and bend it to your needs/demands, there isn't a specific part of the code you're being denied access to. so, use what you have access to [everything], and showcase what you've done, instead of ask/request/beg/pressure/etc... to have what already exists changed when, for better or for worse, the decision was made to leave it as-is.

rdrpenguin04 commented 11 months ago

There is literally no reason to set the premultiplied bit wrong intentionally. That argument is flawed and irrelevant.

Inserting a single bit into this format would make two things happen:

  1. Existing conformant implementations reject a file with pre-multiplied alpha. This is great. This works exactly as it should. The file is not recognized, so it is discarded.
  2. Updated implementations can do whatever they need to with the bit. Game engines can skip a step in conversion.

The point of this request is not to say we should have a "wrapper" to specify one bit of difference; that's the very problem QOI is trying to solve. This is a common enough problem that multiple people are going to start making their own solutions, whether that's using a different format that does have support... or modifying this one, each in their own way. Uh oh. If the point was to make a simple-to-implement and widely-consistent format, then not supporting pre-multiplication upsets that goal indirectly.

Pragmatically, it's also interesting because this proposal doesn't break anything. The argument of a slippery-slope of adding new things doesn't apply when the proposal is to use part of the space that has been deliberately left open: the remainder of the colorspace byte.

Technically, this doesn't need to be specified. After all, the QOI specification does say that the fields are purely informative. The only reason to specify it is for consistency, and it wouldn't harm existing implementations.

Brian151 commented 11 months ago

you'd be surprised. legitimate reason? no... reason? yes i'm a data-miner, seen things done that "there was no reason to do"/"this is INSANE, why did they do it this way when they could've done it that way?". then there's infosec where the whole point is do it wrong to locate/target an exploit.

  1. great, a bunch of people then get issues opened or their socials hounded
  2. the file format problem...

consider this: wrapping it or using an external file allows applying the solution to MULTIPLE formats. there are plenty of cases Qoi doesn't even cover.

now i'm just confused... if conformant implementations are expected to reject this, that constitutes a breaking change. if the field is meant to be informative, why are we using it this way? it would have to be specified, especially if we're assuming these fields now have meaning and implementations should reject the files because some unknown value was set.

also, while [finally] working on my own implementation, i re-read then spec. and the spec specifies the colors are in non-premultiplied format.

also, now that i think about it, Qoi is meant to be lossless , that is explicitely stated as one of its goals. pre-multiplication is known to be lossy. so the only valid way to approach this would be 1 store both formats per-color, always re-caulcating the pre-multiplied one during encoding/saving to ensure no losses

  1. attempt lossless pre-multiplication at expense of some extra bits. i think that might be do-able both approaches however mean making actual breaking changes /:
rdrpenguin04 commented 11 months ago

Straight alpha is lossy. Converting pre-multiplied alpha to straight alpha loses information, including luminance. This was covered by the argument above. On the other hand, in displaying an image, straight alpha is lossy due to lack of precision.