openpreserve / jpylyzer

JP2 (JPEG 2000 Part 1) validator and properties extractor. Jpylyzer was specifically created to check that a JP2 file really conforms to the format's specifications. Additionally jpylyzer is able to extract technical characteristics.
http://jpylyzer.openpreservation.org/
Other
69 stars 29 forks source link

Contribution: Validating ICC Profile #229

Open awoods opened 6 months ago

awoods commented 6 months ago

We have several thousand images that have invalid curv tags. We understand the issue in the embedded ICC profile data in our JP2 images and would like to update Jpylyzer to check for these specific ICC profile errors. Recognizing that embedded ICC profiles are not strictly a part of the JPEG 2000 specification, would you be open to including such an addition/contribution to Jpylyzer? If so, have you already given thought to the design of how you would prefer validation of such embedded profiles to be implemented?

Thanks!

bitsgalore commented 6 months ago

Hi Andrew,

Thanks for reaching out about this. Even though ICC profiles aren't part of the JPEG 2000 standard, I think the ability to validate them could still be a useful addition to Jpylyzer (even as an option). However, from your description the extent of the ICC profile validation you're proposing is not entirely clear to me.

I just had a quick look at the latest ICC filespec, where I see:

Full ICC profile validation would cover all of these (or at minimum the required tags). At first glance implementing this from scratch looks like quite a substantial task. I'm also not sure if doing this directly in Jpylyzer would be the best approach. ICC profiles are widely used in other image formats as well, so for optimum reusability it might be better to address this in a dedicated ICC profile validation tool/library (which could then be imported by other software tools, including Jpylyzer).

However, you mention you'd like to "update Jpylyzer to check for these specific ICC profile errors". This suggests your contribution would only cover specific error(s) from your own images (the "curveType" tag type).

If this is the case, the "validation" would only cover one specific aspect of ICC profiles, which might not be very relevant to other Jpylyzer users. But it's not entirely clear to me if this is what you're proposing here.

Could you provide some more details on the scope and extent of your proposed contribution?

bitsgalore commented 6 months ago

Possibly relevant in this context - the ImageCms Module, which is part of Pillow.

Code is based on LittleCMS. From the docs it seems it does some validation of ICC profiles, but cursory look doesn't bring up any details.

awoods commented 6 months ago

Thanks for the response, Johan, and for the pointers to potentially relevant libraries.

Regarding our scope and extend, we have over 50 million JP2 images that need to be processed for delivery. Across all of those images, we have encountered a wide range of JP2 errors. However, this invalid curv tags issue is the most prevalent. Although we would like to verify as much of the embedded ICC profile as possible, we intend to initially focus on this one specific error.

I agree that validating one specific aspect of the ICC profile may be of limited value to the broader community of Jpylyzer users. The question becomes, would it be useful as a starting point for more extensive ICC profile validation?

I also agree with your suggestion:

it might be better to address this in a dedicated ICC profile validation tool/library (which could then be imported by other software tools, including Jpylyzer

If it makes sense to you, we may start with an independent Python module for this ICC profile validation... and we can subsequently explore importing that code into Jpylyzer.

bitsgalore commented 6 months ago

If it makes sense to you, we may start with an independent Python module for this ICC profile validation... and we can subsequently explore importing that code into Jpylyzer.

Yes, this makes perfect sense to me.

I agree that validating one specific aspect of the ICC profile may be of limited value to the broader community of Jpylyzer users. The question becomes, would it be useful as a starting point for more extensive ICC profile validation?

One idea that just occurred to me, is that you could start this independent Python module as a simple proof-of-concept, that initially only does this specific thing. Then I can have a look how to integrate this into Jpylyzer. This way we can make sure that the integration works from the get-go. Actual integration into the Jpylyzer code base would then still happen later once the module is more fleshed out, but at least this would reduce the risk of any surprises later.

awoods commented 6 months ago

Perfect. I will be in touch as development proceeds.

kimpham54 commented 2 weeks ago

Hi @bitsgalore, in the past few months I worked with Andrew on a python module that addresses this issue to identify and optionally correct invalid curv tags. The project is here: https://github.com/harvard-lts/jp2_remediator. If it's useful, I would be happy to discuss and to work with you on potentially integrating this into jpylyzer. Thanks!

kimpham54 commented 1 day ago

Hi @bitsgalore, I can summarize here what the module does and propose a few options for integration with jpylyzer:

  1. takes an input jp2 file, directory of jp2s, or AWS bucket of jp2s
  2. reads the bytes of the file(s)
  3. checks if the jp2 is valid using jpylyzer
  4. if valid, checks for 'colr' tag which indicates that an ICC profile exists
  5. if present, gets METH value
  6. then looks for TRC tags, rTRC/gTRC/bTRC
  7. gets tag signature (e.g. 'rTRC'), tag offset (where data related to rTRC starts, in our case curv tag data), size of tag data element (trc_tag_size in the module)
  8. gets tag data (curv data), curve type signature, reserved value, count value, actual curve values, this data makes up the curve field length (curv_trc_field_length in the module)
  9. if trc_tag_size == curv_trc_field_length, do nothing
  10. if trc_tag_size != curv_trc_field_length, change tag size to field length, trc_tag_size = curv_trc_field_length, create new jp2 file

As discussed in this thread, as is this module covers one specific aspect of validation, but could be explored further to be used for broader validation.

Ideas for integration:

            <colourSpecificationBox>
                <methIsValid>True</methIsValid>
                <precIsValid>True</precIsValid>
                <approxIsValid>True</approxIsValid>
                <iccSizeIsValid>True</iccSizeIsValid>
                <iccPermittedProfileClass>True</iccPermittedProfileClass>
                <iccNoLUTBasedProfile>True</iccNoLUTBasedProfile>
            </colourSpecificationBox>

would add additional validation tests, such as 'countIsGamma' and 'curveValuesIsValid':

         <colourSpecificationBox>
                <methIsValid>True</methIsValid>
                <precIsValid>True</precIsValid>
                <approxIsValid>True</approxIsValid>
                <iccSizeIsValid>True</iccSizeIsValid>
                <iccPermittedProfileClass>True</iccPermittedProfileClass>
                <iccNoLUTBasedProfile>True</iccNoLUTBasedProfile>
                <iccrTRCcountValueIsValid>False</iccrTRCcountValueIsValid>
                <countIsGamma>True</countIsGamma>
                <curveValuesIsValid>False<curveValuesIsValid>
            </colourSpecificationBox>