mm2 / Little-CMS

A free, open source, CMM engine. It provides fast transforms between ICC profiles.
https://www.littlecms.com
MIT License
572 stars 176 forks source link

Helper function to pick a good LUT size for a given profile? #318

Closed haasn closed 2 years ago

haasn commented 2 years ago

So, I have a use case where I need to bake the contents of a ICC profile into a (3D)LUT for use with realtime color management on the GPU. To minimize error, I'm constructing a cmsHTRANSFORM that goes from the input profile to the closest well-defined approximation (or vice versa), so the overall response of the 3DLUT should be as linear as possible (essentially only encoding the residual).

In this use case, using too much precision in the 3DLUT is overkill. I would ideally like to tune the 3DLUT precision to roughly correspond to the precision of the underlying CLUTs/tables, e.g. 33x33x33 or something like that. (I'm not entirely sure what values are typically found in profiles?)

Do you have any suggestion on how to implement some sort of decision logic for how large I should ideally make my 3DLUTs? Or could you provide some sort of helper for introspecting CLUT-based profiles, in order to figure out how large their tables are? As an aside, I was also considering dissecting the ICC profiles and applying the shaper matrices and LUTs individually on the GPU, without building any sort of overall 3DLUT, but it doesn't appear that LittleCMS provides any way of gaining access to raw tables contained inside the profile?

mm2 commented 2 years ago

Well, there is some support for what you want, but in the reverse way. You tell lcms which CLUT table size you wish and it does the necessary resampling. The problem here is that a color transform is the concatenation of two profiles and usually some additional steps, so it easily can reach two CLUT, 4 sets of curves and a couple of matrix multiplications. All together is too complicated to put in the GPU, and most times resolution of curves and CLUT is different, so the library provides some flags for cmsCreateTransform that simplifies all that to a set of curves plus a CLUT plus another set of curves. Then you can convert this transform to device link and use the AtoB0 tag of this device link in the GPU. The flags that may be of your interest are cmsFLAGS_FORCE_CLUT, cmsFLAGS_CLUT_POST_LINEARIZATION, cmsFLAGS_CLUT_PRE_LINEARIZATION, cmsFLAGS_GRIDPOINTS and maybe the conversion to 8 bits cmsFLAGS_8BITS_DEVICELINK.

Tho choose a reasonable number of grid points, there is an undocumented function, but that's just hardcoded numbers. Take a look on the source code

CMSAPI cmsUInt32Number CMSEXPORT _cmsReasonableGridpointsByColorspace(cmsColorSpaceSignature Colorspace, cmsUInt32Number dwFlags);

haasn commented 2 years ago

I'm afraid I still don't quite understand how your suggestion would work in practice, though: If I read the A2B0 tag, I merely get back a cmsPipeline, with the same problem as I had before. I can iterate over the pipeline stages with cmsStageNext etc, but then what? As far as I can tell that's a dead end because cmsStage is largely a black box in the public API.

I can maybe sample from it with cmsStageSampleCLut* to get to a regularly spaced 3DLUT, but I wouldn't necessarily know what values to sample. (Unless I also hard-code the CLUT size and hope that everything works out)

The problem here is that a color transform is the concatenation of two profiles and usually some additional steps, so it easily can reach two CLUT, 4 sets of curves and a couple of matrix multiplications.

I fortunately don't need to deal with the full complexity because I necessarily need extra color processing steps (HDR tone mapping) in the middle, between application of the source profile and application of the target profile. These intermediate color processing steps cannot currently be modeled by ICC (which does not even conceptually support HDR or absolute scale brightness operation), since they, among other complications, depend on dynamic external metadata or measured global frame properties (e.g. brightness histogram).

So I'm currently planning for the following architecture:

  1. Decode from input profile to a good PCS (if needed)
  2. Apply HDR tone mapping and other necessary gamut postprocessing in PCS
  3. Encode from PCS to output profile (if needed)

Since 1 and 3 are separate steps, I'm conceptually only worrying about one of them. If they have different CLUT resolutions, then so be it - it's hard to do an overall optimization with step 2 in the middle. Also, in the vast majority of cases, step 1 will not be necessary because the input file does not even have an ICC profile. (In which case I can transform the input colors to the PCS using fixed function shader routines that don't require sampling from any LUT)

For the choice of PCS, I'm currently looking at using linear RGB in the smallest set of containing primaries that fits the entirety of the source profile (or the tagged primaries in the absence of a source profile). It would be completely wasteful to even use any CLUT if the source profile does not need it.

mm2 commented 2 years ago

As far as I can tell that's a dead end because cmsStage is largely a black box in the public API.

cmsStageData gives you a pointer to the data block, then with the plug-in API you have _cmsStageCLutData, _cmsStageMatrixData and _cmsStageToneCurvesData that interprets the data according the type of stage. If using the flags you force the transform to curves + CLUT + curves and then convert the transform to devicelink, you can easily get all components.

You approach of using a intermediate space is ok if you need frequent adjusts. Another solution would be to create a complex multiprofile transform, collapse it to CLUT + curves and from this then create tables for GPU. I have used this for other projects and can tell you that works, but you have to be careful when placing curves.

haasn commented 2 years ago

Alright, I think that clears it up. I suppose this issue can be closed, especially since those functions give me the necessary insight to figure out how large the underlying CLUT is (thus answering the original question).