mattiasw / ExifReader

A JavaScript Exif info parser.
Mozilla Public License 2.0
737 stars 88 forks source link

`zTXt` tag values are decoded incorrectly #324

Closed amarjandu closed 2 months ago

amarjandu commented 2 months ago

Description

The specification for png zTXt value encoding is latin1, as seen https://www.w3.org/TR/png/#11zTXt. Currently it appears that ExifReader decodes values as utf-8, which causes some loss of data...

I suspect that the library also uses the same decoding for tEXt values, but have not confirmed, the w3 link shows the specification for tEXt also uses latin1, while iTXt uses utf-8.

Additional details

I checked the chunks with https://www.nayuki.io/page/png-file-chunk-inspector, where I can see the latin1 characters encoded correctly, as well as with exiftool...

The global object on the linked site also shows the wrong character. With ExifTool you can see that the character is an À.

Screenshot 2024-05-23 at 5 48 40 PM

How to reproduce

I've included the png below, I hope the field was not removed :(

naan-compressed-latin

  1. Load the image
  2. View the tags.

What I expected would happen:

Tags values are decoded

What really happened:

Tag values are decoded (incorrectly 😢 )

mattiasw commented 2 months ago

Hi! Thanks for the report. After a quick look it seems it might be tricky to solve. The implementation uses the Web API Response.text() and MDN says this about it: "The response is always decoded using UTF-8."

https://developer.mozilla.org/en-US/docs/Web/API/Response/text

I will take a closer look though and see if anything can be done.

amarjandu commented 2 months ago

Maybe something along the lines of...

const tagType = "zTXt"

function labelFor(tagType: string){
  if (tagType === "zTXt") return "latin1";
  if (tagType === "tEXt") return "latin1";
  return 'utf-8'
} 

text = TextDecoder(labelFor(tagType)).decode(Response.arrayBuffer());

I just realized that this project is also designed to work in the browser and not just nodeJS...

If you have other ideas on how to approach this, I can certainly help out....

mattiasw commented 2 months ago

Thanks, looks like it should work, will try it out!

mattiasw commented 2 months ago

It worked great! I found some other non-related issues though while testing this that I have to fix first.

amarjandu commented 2 months ago

Are the "non-related" issues related to if/how compressed values are resolved? Or is this an a "how I'm using it" problem?

describe("ExifReader", () => {
  const file = "/Users/amar/dev/fastai/naan-compressed-latin.png"

  it("loads png from file with async set", async () => {
    const data = await ExifReader.load(file, {async: true});
    console.log(data);
    expect(data["wassup"].value).toContain("hello");
  });

  it("loads png from file, sync", async () => {

    const data = ExifReader.load(file);
    console.log(data);
    expect(data["wassup"].value).toContain("hello");
  });

  it("loads png from buffer", async () => {
    const data = readFileSync(file)
    const exifData = await ExifReader.load(data, {async: true});
    console.log(exifData);
    expect(exifData["wassup"].value).toContain("hello");
  })
});

outputs

  console.log
    {
      Orientation: { id: 274, value: 1, description: 'top-left' },
      'Exif IFD Pointer': { id: 34665, value: 38, description: 38 },
      ColorSpace: { id: 40961, value: 1, description: 'sRGB' },
      PixelXDimension: { id: 40962, value: 64, description: 64 },
      PixelYDimension: { id: 40963, value: 64, description: 64 },
      'Image Width': { value: 64, description: '64px' },
      'Image Height': { value: 64, description: '64px' },
      'Bit Depth': { value: 8, description: '8' },
      'Color Type': { value: 2, description: 'RGB' },
      Compression: { value: 0, description: 'Deflate/Inflate' },
      Filter: { value: 0, description: 'Adaptive' },
      Interlace: { value: 0, description: 'Noninterlaced' },
      wassup: { value: Promise { <pending> }, description: Promise { <pending> } },
      FileType: { value: 'png', description: 'PNG' }
    }

      at Object.log (image-manipulation/__test__/inject-into-exif.test.ts:66:13)

Error: expect(received).toContain(expected) // indexOf

Expected value:  "hello"
Received object: {}

    at Object.toContain (/Users/amar/dev/tcloud-alpha/src/image-manipulation/__test__/inject-into-exif.test.ts:67:34)
  console.log
    Promise { <pending> }

      at Object.log (image-manipulation/__test__/inject-into-exif.test.ts:73:13)

  console.log
    {
      Orientation: { id: 274, value: 1, description: 'top-left' },
      'Exif IFD Pointer': { id: 34665, value: 38, description: 38 },
      ColorSpace: { id: 40961, value: 1, description: 'sRGB' },
      PixelXDimension: { id: 40962, value: 64, description: 64 },
      PixelYDimension: { id: 40963, value: 64, description: 64 },
      'Image Width': { value: 64, description: '64px' },
      'Image Height': { value: 64, description: '64px' },
      'Bit Depth': { value: 8, description: '8' },
      'Color Type': { value: 2, description: 'RGB' },
      Compression: { value: 0, description: 'Deflate/Inflate' },
      Filter: { value: 0, description: 'Adaptive' },
      Interlace: { value: 0, description: 'Noninterlaced' },
      wassup: { value: Promise { <pending> }, description: Promise { <pending> } },
      FileType: { value: 'png', description: 'PNG' }
mattiasw commented 2 months ago

It is related to the async vs. sync situation. :-) But it's regarding the test scripts I have for checking that a new version didn't change the output of any older image files unless on purpose. The scripts did not handle the async part correctly and missed all asynchronous tags. 🙈

I don't get the same output as you though, with the pending promises in value and description. :thinking:

mattiasw commented 2 months ago

Fixed and released as version 4.23.2. Thanks again for reporting!