haraldk / TwelveMonkeys

TwelveMonkeys ImageIO: Additional plug-ins and extensions for Java's ImageIO
https://haraldk.github.io/TwelveMonkeys/
BSD 3-Clause "New" or "Revised" License
1.9k stars 314 forks source link

It would be nice if Twelvemonkeys had support for EXIF tag 0xa436 Title #829

Closed steinarb closed 1 year ago

steinarb commented 1 year ago

I have a web application where I have the possibility to set the date, title and description of the images.

The date, title and description are stored in a database and the webapp fetches them from there.

However I would like to store those three values (date, title and description) inside the JPEG images when downloading the JPEGs.

I google and found this list of tags: https://exiftool.org/TagNames/EXIF.html

And in the above URL I found the following promising tags: tag id tag name comment
0xa436 Title
0x010e ImageDescription
0x0132 ModifyDate called DateTime by the EXIF spec
0x9003 DateTimeOriginal date/time when the original image was taken
0x9004 CreateDate called DateTimeDigitized by the EXIF spec

The 0x0132/ModifyDate and 0x010e/ImageDescription both worked fine.

But 0xa436/Title didn't work and grepping through the source code I haven't gotten any meaningful match when searching for "title" (without the quotes).

I would like 0xa436/Title to be supported and the support to be similar to that of 0x010e/ImageDescription

The alternative is to store just DateTime and the description of the image in the EXIF data structure, and that is what I will do when/until 0xa436/Title is supported.

steinarb commented 1 year ago

Found this list as well https://exiv2.org/tags.html

From this it looks like 0x010e/ImageDescription is intended for title-like stuff and 0x9286/UserComment is intended for description-like stuff?

And 0xa436/Title isn't mentioned in the above URL.

So maybe the right thing to do is to use ImageDescription for the title and UserComment for the description and not implement this feature?

steinarb commented 1 year ago

I made a PR anyway, since I already had created the branch.

haraldk commented 1 year ago

Thanks!

I made some comments on the PR, short version; this is an Exif tag, and must thus be in the Exif sub-IFD to work with Exif-aware software. The PR can't be merged as-is.

It's of course possible for you to just define your own constants and just write the tags inside the Exif sub-IFD, but it would be nice to have the extra Exif tags in the library. However, most of the changes in the PR are related to the TIFF format, but here you mention you want to store the values for JPEG (I know, it's confusing, as the "Exif metadata" in JPEG is in fact a complete TIFF without image data).

The standard JPEG image metadata (from the JDK) has no special support for Exif, and will just return it as an "Unknown" segment (and a byte array you can parse yourself). Our JPEG plugin has some support for Exif rotation field and we read the Exif thumbnail if present, but the values you are looking for is not exposed (you can still get the Exif chunk as in the JDK metadata). There is a TODO in the com.twelvemonkeys.imageio.plugins.jpeg.JPEGImage10Metadata.getStandardTextNode method, if you like to expose more fields.

Writing an Exif JPEG (as opposed to a JFIF JPEG) is also a bit complicated as we delegate the actual writing to the JDK JPEG plugin. I believe it's possible, it's just not very straight forward. But as you write that:

The 0x0132/ModifyDate and 0x010e/ImageDescription both worked fine.

...you might have already sorted that out? 😀

steinarb commented 1 year ago

Thanks for the explanation!

I've tried figuring out things looking on the exiftool output from my own 1990-ies jpeg images (where I used the metadata comment to put text) and exif information in 00-s JPEGs created by digital companies (and have had success in correcting google photo sort order by manipulating the EXIF datetime).

haraldk commented 1 year ago

Okay... So. Reading up on the specs, the Exif 2.3 Specification, lists "Image title" as the TIFF "Image description" tag (270/0x010E).

The Exif ImageTitle tag (42038/0xA436) is defined in Exif 3.0 along with some other "Other" tags like Photographer, ImageEditor and "Software Information" tags; CameraFirmware,RAWDevelopingSoftware, ImageEditingSoftware, and MetadataEditingSoftware. The 3.0 spec also describes a new UTF-8 data type that is not in the TIFF 6.0 spec (redefining the allowed types of existing TIFF 6.0 fields). Wonderful... 😛 Luckily we already interpret all ASCII fields as UTF-8, so it should be easy to support.

Anyway...

I think I would probably just stick to using the existing TIFF 6.0 fields ImageDescription and ModifyDate, and use UTF-8 in the description if needed, but leave the type as ASCII (most software will accept this, even if non-spec).

If you need the other fields, you should be able to add them to the metadata without changes in the library, as values in the nested Exif sub-IFD. These should be preserved in both TIFF and JPEG (unless the JDK plugin somehow filters it out).

But I don't think I'll add support for Exif 3.0 yet. 😉

steinarb commented 1 year ago

(the nice thing about UTF-8 is that if you only have US-ASCII characters, UTF-8 is US-ASCII... so if we just stop using æ, ø and å and the like, we're good! "Gala" works as well as "Gålå" doesn't it... ;-) )

As said in addition to ImageDescription (for title) and DateTime I plan to use UserComment (which from the code it looks like TwelveMonkeys supports, but I've been unable to pick up UserComment set in a JPEG using exiftool) for the description.

For now... I'm open to adjusting this depending how various other software makes use of (or not makes use of) these tags.

I'll post URLs to my exiftool-manipulated test files tonight

haraldk commented 1 year ago

To try to clarify a few things:

TwelveMonkeys has special support for the Exif sub-IFD (TIFF tag 34665/0x8769) in TIFF data. You can put whatever tags you like in this IFD, there's no special support for anything inside the IFD*. As long as the structure is valid, no filtering of values occur in either serialization or deserialization. The constants in the TIFF or EXIF interface is just to be able to avoid magic numbers in code, they don't need to be there for tags using this value to be present.

The only thing needed/missing for Exif 3.0 support is the UTF-8 data type.

*) Update: Not quite true it seems, there is special support to parse the Interop sub-IFD (TIFF tag 40965/0xA005) which is nested inside the Exif sub-IFD. But it shouldn't matter for your use case. 😀

steinarb commented 1 year ago

Here are the test files I've used in my metadata parsing:

  1. A 1996 vintage JFIF without EXIM metadata and with a comment in the metadata (TwelveMonkeys finds the comment) (created by cjpeg)
  2. A picture taken by a Casio Exilim II digital camera in October 2005 (autumn leaves at Gålå)
  3. The Gålå autumn picture with an ImageDescription tag added by exiftool (TwelveMonkeys EXIFReader finds the tag)
  4. The Gålå autumn picture with an ImageDescription and Title tag added by exiftool (this is the tag that caused this issue)
  5. The Gålå autumn picture with an ImageDescription and UserComment tag added by exiftool (I still haven't found the UserComment tag with TwelveMonkeys but with your comments above I have hopes I eventually will. I think I found this in the source code...?)
steinarb commented 1 year ago

As said in addition to ImageDescription (for title) and DateTime I plan to use UserComment (which from the code it looks like TwelveMonkeys supports, but I've been unable to pick up UserComment set in a JPEG using exiftool) for the description.

I haven't pushed the exif reading code to https://github.com/steinarb/twelvemonkeys-karaf-demo yet, but the code that reads exif looks like this: https://gist.github.com/steinarb/b38987b9a30104ad1d6061b24f3a2419#file-imageserviceprovider-java-L66

In the debugger I see that the exif variable contains the UserComment with the expected integer identifier: https://gist.github.com/steinarb/e4df52a8f727b22749a07efedd432d37

But when I iterate over it, the iteration skips from 34665/EXIF to 50341 (with no symbolic tag).

Never mind! There I saw it and saw what you tried to tell me above: 34665/EXIF is nested and the user comment is inside that

(Trying to explain things often helps! :smile: )

haraldk commented 1 year ago

Hi Steinar,

Can we close this issue? I understood from your last message that you found the tags you were looking for and were happy? 😀

If not, feel free to reopen.