sciencehistory / scihist_digicoll

Science History Institute Digital Collections
Other
13 stars 0 forks source link

Technical metadata for TIFFs #2271

Closed jrochkind closed 1 year ago

jrochkind commented 1 year ago

For audio and video, we extract some technical metadata and store in database, where it can be displayed on Asset admin screen. bitrate, etc.

We don't currently do that for TIFFs -- beyond pixel width and height.

But as we started using TIFF dpi (which is actually metadata embedded in TIFF) to effect PDF generation, we might want it reflected back on admin screen -- to more easily debug it etc.

And this brought up, there might be other TIFF metadata we might want. In general, most of this stuff is stored as metadata inside a TIFF -- it's there in the TIFF originals in Dig Coll anyway, but it can't be easily seen or filtered without downloading TIFF files and extracting it.

Maybe we want to extract it on ingest as part of characterization? In addition to dpi, perhaps things like camera model, photo creation date, color profile, more?

(If DPI were in db metadata, we could use it from there on PDF generation instead of pulling it out from TIFF each time, which would give us some more flexibility in using derivatives as sources, etc).

Here is everything exiftool knows about in a TIFF. (We don't currently use/install exiftool , but possibly could. Or there are other tools that can get some but maybe not all of this, like mediainfo which we do already use).

From the first page of: https://digital.sciencehistory.org/works/dvoykwl

[ExifTool]      ExifTool Version Number         : 12.50
[File]          File Name                       : conversations_on_chemistr_dvoykwl_3_7d2d5ks.tiff
[File]          Directory                       : /Users/jrochkind/Downloads
[File]          File Size                       : 31 MB
[File]          File Modification Date/Time     : 2023:07:25 16:05:54-04:00
[File]          File Access Date/Time           : 2023:07:25 16:05:57-04:00
[File]          File Inode Change Date/Time     : 2023:07:25 16:05:54-04:00
[File]          File Permissions                : -rw-r--r--
[File]          File Type                       : TIFF
[File]          File Type Extension             : tif
[File]          MIME Type                       : image/tiff
[File]          Exif Byte Order                 : Little-endian (Intel, II)
[File]          Current IPTC Digest             : 8f05da416079a49c8dbe23beb254942b
[EXIF]          Image Width                     : 2517
[EXIF]          Image Height                    : 4156
[EXIF]          Bits Per Sample                 : 8 8 8
[EXIF]          Compression                     : Uncompressed
[EXIF]          Photometric Interpretation      : RGB
[EXIF]          Make                            : Phase One
[EXIF]          Camera Model Name               : IQ3 80MP
[EXIF]          Strip Offsets                   : (Binary data 36 bytes, use -b option to extract)
[EXIF]          Orientation                     : Horizontal (normal)
[EXIF]          Samples Per Pixel               : 3
[EXIF]          Rows Per Strip                  : 1024
[EXIF]          Strip Byte Counts               : (Binary data 38 bytes, use -b option to extract)
[EXIF]          X Resolution                    : 600
[EXIF]          Y Resolution                    : 600
[EXIF]          Planar Configuration            : Chunky
[EXIF]          Resolution Unit                 : inches
[EXIF]          Software                        : Capture One 12 Macintosh
[EXIF]          Exposure Time                   : 1/60
[EXIF]          Exposure Program                : Manual
[EXIF]          ISO                             : 50
[EXIF]          Exif Version                    : 0230
[EXIF]          Date/Time Original              : 2023:06:28 15:32:00
[EXIF]          Create Date                     : 2023:06:28 15:32:00
[EXIF]          Shutter Speed Value             : 1/60
[EXIF]          Metering Mode                   : Average
[EXIF]          Light Source                    : Other
[EXIF]          Exif Image Width                : 2517
[EXIF]          Exif Image Height               : 4156
[EXIF]          Focal Plane X Resolution        : 1923.076935
[EXIF]          Focal Plane Y Resolution        : 1923.076935
[EXIF]          Focal Plane Resolution Unit     : cm
[EXIF]          Sensing Method                  : One-chip color area
[EXIF]          File Source                     : Digital Camera
[EXIF]          Scene Type                      : Directly photographed
[EXIF]          White Balance                   : Unknown (5)
[EXIF]          Image Unique ID                 : 00005000008200000400E058000119E0
[EXIF]          Serial Number                   : IP001022
[EXIF]          Lens Model                      : -- mm f/--
[XMP]           XMP Toolkit                     : XMP Core 5.5.0
[XMP]           Creator Tool                    : Capture One 12 Macintosh
[XMP]           Lens                            : -- mm f/--
[XMP]           Image Number                    : 72160
[XMP]           Firmware                        : IQ3 80MP, User Firmware: 3.06.1
[XMP]           Legacy IPTC Digest              : 0A99BC4F930BC8008F4B0725E97FB3DD
[IPTC]          Coded Character Set             : UTF8
[IPTC]          Application Record Version      : 4
[IPTC]          Date Created                    : 2023:06:28
[IPTC]          Time Created                    : 15:32:00
[IPTC]          Digital Creation Date           : 2023:06:28
[IPTC]          Digital Creation Time           : 15:32:00
[ICC_Profile]   Profile CMM Type                : Adobe Systems Inc.
[ICC_Profile]   Profile Version                 : 2.1.0
[ICC_Profile]   Profile Class                   : Display Device Profile
[ICC_Profile]   Color Space Data                : RGB
[ICC_Profile]   Profile Connection Space        : XYZ
[ICC_Profile]   Profile Date Time               : 2000:08:11 19:51:59
[ICC_Profile]   Profile File Signature          : acsp
[ICC_Profile]   Primary Platform                : Apple Computer Inc.
[ICC_Profile]   CMM Flags                       : Not Embedded, Independent
[ICC_Profile]   Device Manufacturer             : none
[ICC_Profile]   Device Model                    :
[ICC_Profile]   Device Attributes               : Reflective, Glossy, Positive, Color
[ICC_Profile]   Rendering Intent                : Perceptual
[ICC_Profile]   Connection Space Illuminant     : 0.9642 1 0.82491
[ICC_Profile]   Profile Creator                 : Adobe Systems Inc.
[ICC_Profile]   Profile ID                      : 0
[ICC_Profile]   Profile Copyright               : Copyright 2000 Adobe Systems Incorporated
[ICC_Profile]   Profile Description             : Adobe RGB (1998)
[ICC_Profile]   Media White Point               : 0.95045 1 1.08905
[ICC_Profile]   Media Black Point               : 0 0 0
[ICC_Profile]   Red Tone Reproduction Curve     : (Binary data 14 bytes, use -b option to extract)
[ICC_Profile]   Green Tone Reproduction Curve   : (Binary data 14 bytes, use -b option to extract)
[ICC_Profile]   Blue Tone Reproduction Curve    : (Binary data 14 bytes, use -b option to extract)
[ICC_Profile]   Red Matrix Column               : 0.60974 0.31111 0.01947
[ICC_Profile]   Green Matrix Column             : 0.20528 0.62567 0.06087
[ICC_Profile]   Blue Matrix Column              : 0.14919 0.06322 0.74457
[Composite]     Image Size                      : 2517x4156
[Composite]     Megapixels                      : 10.5
[Composite]     Scale Factor To 35 mm Equivalent: 1.7
[Composite]     Shutter Speed                   : 1/60
[Composite]     Date/Time Created               : 2023:06:28 15:32:00
[Composite]     Digital Creation Date/Time      : 2023:06:28 15:32:00
[Composite]     Circle Of Confusion             : 0.018 mm
[Composite]     Lens ID                         : -- mm f/--
jrochkind commented 1 year ago

cc @apinkney0696

jrochkind commented 1 year ago

@eddierubeiz So we already have some CLI tools installed that can get technical metadata for TIFFs, but they can't get a lot and depending on what we want, we might just want to add exiftool as a CLI dependency -- I think it is available for both brew and apt, just add it to the relevant files Brewfile and Aptfile, etc.

apinkney0696 commented 1 year ago

Here is the metadata that, from my perspective, would be nice to have available on the backend:

[File]             File Size                        : 31 MB
[File]                 File Type                        : TIFF
[Composite]      Digital Creation Date/Time         :2023:06:28 15:32:00
[Composite]         Image Size                          : 2517x4156
[EXIF]              Bits Per Sample                     : 8 8 8
[EXIF]              Photometric Interpretation          : RGB
[EXIF]              Compression                         : Uncompressed
[EXIF]              Make                                : Phase One
[EXIF]              Camera Model Name                   : IQ3 80MP
[EXIF]              X Resolution                        : 600
[EXIF]              Y Resolution                        : 600
[EXIF]              Software                            : Capture One 12 Macintosh
[EXIF]              Lens Model                          : -- mm f/--
[Composite]         Shutter Speed                       : 1/60
[EXIF]              ISO                                 : 50
[ICC_Profile]       Profile Description                 : Adobe RGB (1998)

Please add to this list anything that would be useful to you!

jrochkind commented 1 year ago

@apinkney0696 Re: "Lens Model" -- I thought -- mm f/-- meant that it was not actually getting anything useful there, but did I misunderstand, is that actually a useful value?

Or we want to capture it in case future software puts something more useful there?

Good call on "Compression" -- although i think all of our TIFFs at least are intended to be uncompressed right now, I still think we should consider in the future changing this!

I don't understand what "Photometric Interpretation: RGB" means, if you do, I'd be curious to learn more!

"File Type" and "Image Size" we already have captured, from other reliable charecterization routines, and don't need to get the again via this work, we've already got them. (The stuff being reported from exiftool may be coming from metadata rather than physical file analysis, but either way, we've already got it and should not get them again).

apinkney0696 commented 1 year ago

So I guess I am hopeful that the Lens model data might show up sometimes. This is the lens length and the f-stop value. It should have this so I'm not sure why it's not captured in this example. If we have the shutter speed and ISO, I would also like to capture the f-stop. It should show up for object-photography where I often attach different lenses. The top-down/2D photography is a funny lens situation - the sensor is attached directly to the copy stand arm which has a lens in it - so I think that's why the data isn't here in this case.

Photometric interpretation is related to the bits per sample field. This data shows me I have a 24-bit RGB image. So basically an image in which each pixel is represented by three 8-bit quantities that tell the intensities of red, green, and blue for the color of the pixels.

jrochkind commented 1 year ago

OK, if Lens metadata was not being added in this sample image, I would assume that if you are using the same software it will not show up in other images.

I guess it's possible it would show up in an image from your other camera/setup, if the software is different, but my guess would be it would not. But I can check a Museum 3d image, let's see...

https://digital.sciencehistory.org/admin/asset_files/mbdowll oh hey, this one DOES have some lense info captured! Actually it might have whole additional metadata fields too regarding lense, I'll paste it all here below.

I wonder why some images have lense info and some don't! Perhaps the older ones, the software didn't properly tag them.

I guess those would be questions for the vendor.

If there are any other example fields you've noted that AREN'T showing you the data you would want, please do point them out, so we can make sure we aren't just extracting useless fields. In general, I'd assume these examples are "typical", unless we investigate more.

ExifTool Version Number         : 12.50
File Name                       : lydia_e_odaldrb_1_mbdowll.tiff
Directory                       : /Users/jrochkind/Downloads
File Size                       : 66 MB
File Modification Date/Time     : 2023:08:01 16:13:03-04:00
File Access Date/Time           : 2023:08:01 16:13:05-04:00
File Inode Change Date/Time     : 2023:08:01 16:13:03-04:00
File Permissions                : -rw-r--r--
File Type                       : TIFF
File Type Extension             : tif
MIME Type                       : image/tiff
Exif Byte Order                 : Little-endian (Intel, II)
Subfile Type                    : Full-resolution image
Image Width                     : 4493
Image Height                    : 4928
Bits Per Sample                 : 8 8 8
Compression                     : Uncompressed
Photometric Interpretation      : RGB
Make                            : Phase One
Camera Model Name               : IQ180
Strip Offsets                   : 24420
Orientation                     : Horizontal (normal)
Samples Per Pixel               : 3
Rows Per Strip                  : 4928
Strip Byte Counts               : 66424512
X Resolution                    : 400
Y Resolution                    : 400
Planar Configuration            : Chunky
Resolution Unit                 : inches
Software                        : Adobe Photoshop 24.0 (Windows)
Modify Date                     : 2023:05:04 14:50:59
XMP Toolkit                     : Adobe XMP Core 9.0-c000 79.171c27f, 2022/08/16-18:02:43
Creator Tool                    : Capture One 21 Macintosh
Metadata Date                   : 2023:05:04 14:50:59-04:00
Lens                            : Schneider Kreuznach LS 55mm f/2.8
Image Number                    : 25447
Firmware                        : IQ180, User Firmware: 8.06.1
Color Mode                      : RGB
ICC Profile Name                : Adobe RGB (1998)
Format                          : image/tiff
History Action                  : derived, saved, saved, converted, derived, saved
History Parameters              : converted from image/tiff to application/vnd.adobe.photoshop, from application/vnd.adobe.photoshop to image/tiff, converted from application/vnd.adobe.photoshop to image/tiff
History Instance ID             : xmp.iid:463279cf-b770-1846-a0a5-ad11d45d197d, xmp.iid:8bdf6eed-738d-7f4f-9a97-5c2fc9acbded, xmp.iid:e0e77459-903e-0045-9261-6802c423a84a
History When                    : 2023:05:04 14:50:36-04:00, 2023:05:04 14:50:59-04:00, 2023:05:04 14:50:59-04:00
History Software Agent          : Adobe Photoshop 24.0 (Windows), Adobe Photoshop 24.0 (Windows), Adobe Photoshop 24.0 (Windows)
History Changed                 : /, /, /
Derived From Instance ID        : xmp.iid:8bdf6eed-738d-7f4f-9a97-5c2fc9acbded
Derived From Document ID        : xmp.did:463279cf-b770-1846-a0a5-ad11d45d197d
Derived From Original Document ID: xmp.did:463279cf-b770-1846-a0a5-ad11d45d197d
Document ID                     : adobe:docid:photoshop:3b403bcc-c5cc-5145-b03a-24731c614184
Instance ID                     : xmp.iid:e0e77459-903e-0045-9261-6802c423a84a
Original Document ID            : xmp.did:463279cf-b770-1846-a0a5-ad11d45d197d
White Balance                   : Auto
Current IPTC Digest             : 016f0e4b8989f50b3e0df33b2b85a4c8
Coded Character Set             : UTF8
Application Record Version      : 0
Date Created                    : 2023:05:04
Time Created                    : 10:18:56
IPTC Digest                     : 016f0e4b8989f50b3e0df33b2b85a4c8
Displayed Units X               : inches
Displayed Units Y               : inches
Print Style                     : Centered
Print Position                  : 0 0
Print Scale                     : 1
Global Angle                    : 30
Global Altitude                 : 30
URL List                        :
Slices Group Name               : 2006.050.058_039
Num Slices                      : 1
Pixel Aspect Ratio              : 1
Photoshop Thumbnail             : (Binary data 1984 bytes, use -b option to extract)
Has Real Merged Data            : Yes
Writer Name                     : Adobe Photoshop
Reader Name                     : Adobe Photoshop 2023
Exposure Time                   : 0.5
F Number                        : 14.0
Exposure Program                : Manual
ISO                             : 35
Exif Version                    : 0230
Date/Time Original              : 2023:05:04 10:18:56
Create Date                     : 2023:05:04 10:18:56
Shutter Speed Value             : 0.5
Aperture Value                  : 14.0
Exposure Compensation           : 0
Metering Mode                   : Average
Light Source                    : Unknown
Focal Length                    : 55.0 mm
Color Space                     : Uncalibrated
Exif Image Width                : 4493
Exif Image Height               : 4928
Focal Plane X Resolution        : 1923.076935
Focal Plane Y Resolution        : 1923.076935
Focal Plane Resolution Unit     : cm
Sensing Method                  : One-chip color area
File Source                     : Digital Camera
Scene Type                      : Directly photographed
Image Unique ID                 : 00005000007200000400E05800006367
Serial Number                   : FP040179
Lens Info                       : 55mm f/21.99997864-2.799996445
Lens Model                      : Schneider Kreuznach LS 55mm f/2.8
Profile CMM Type                : Adobe Systems Inc.
Profile Version                 : 2.1.0
Profile Class                   : Display Device Profile
Color Space Data                : RGB
Profile Connection Space        : XYZ
Profile Date Time               : 2000:08:11 19:51:59
Profile File Signature          : acsp
Primary Platform                : Apple Computer Inc.
CMM Flags                       : Not Embedded, Independent
Device Manufacturer             : none
Device Model                    :
Device Attributes               : Reflective, Glossy, Positive, Color
Rendering Intent                : Perceptual
Connection Space Illuminant     : 0.9642 1 0.82491
Profile Creator                 : Adobe Systems Inc.
Profile ID                      : 0
Profile Copyright               : Copyright 2000 Adobe Systems Incorporated
Profile Description             : Adobe RGB (1998)
Media White Point               : 0.95045 1 1.08905
Media Black Point               : 0 0 0
Red Tone Reproduction Curve     : (Binary data 14 bytes, use -b option to extract)
Green Tone Reproduction Curve   : (Binary data 14 bytes, use -b option to extract)
Blue Tone Reproduction Curve    : (Binary data 14 bytes, use -b option to extract)
Red Matrix Column               : 0.60974 0.31111 0.01947
Green Matrix Column             : 0.20528 0.62567 0.06087
Blue Matrix Column              : 0.14919 0.06322 0.74457
Aperture                        : 14.0
Image Size                      : 4493x4928
Megapixels                      : 22.1
Scale Factor To 35 mm Equivalent: 1.2
Shutter Speed                   : 0.5
Date/Time Created               : 2023:05:04 10:18:56
Circle Of Confusion             : 0.024 mm
Field Of View                   : 29.4 deg
Focal Length                    : 55.0 mm (35 mm equivalent: 68.6 mm)
Hyperfocal Distance             : 8.97 m
Light Value                     : 10.1
Lens ID                         : Schneider Kreuznach LS 55mm f/2.8
jrochkind commented 1 year ago

A more recent Bredig file still doesn't have any lense info.

https://digital.sciencehistory.org/admin/asset_files/ehdl2vm

Lens ID                         : -- mm f/--

It may be the setup you use for 3D objects is capturing lense info, but the one you use for 2D photos is not? Just guessing!

apinkney0696 commented 1 year ago

Yeah, that's my guess - 3D objects will have lens info, but 2D will not because of the setup I described above.

So yes, I definitely would like:

Lens ID : Schneider Kreuznach LS 55mm f/2.8

But also, now that I see this,

Software                        : Adobe Photoshop 24.0 (Windows)
Modify Date                     : 2023:05:04 14:50:59

so that it's clear if it's been photoshopped at all too.

Please and thank you!

jrochkind commented 1 year ago

OK, in working on text PDF still, I'm discovering that a few of our TIFFs have an "alpha channel" -- normally used for making "transparencies" in images.

I think this was probably a mistake. Perhaps mistakenly added when edited with Photoshop -- the one example image I'm looking at has metadata saying it was edited with photoshop too:

https://digital.sciencehistory.org/admin/asset_files/1v53jx931

I don't know why that one would have been edited by photoshop, but it's got the metadata.

This extra alpha channel isn't necessarily a problem (I'm not sure), but it is a problem for my PDF generation pipeline -- okay, so just another thing I have to handle, it can be done.

But it occurs to me that maybe we want to capture this in extracted metadata too.

I think it shows up as Samples Per Pixel of 3 (normal) vs 4 (with alpha channel).

Additionally, the one with the alpha channel has metadata saying:

Alpha Channels Names            : Alpha 1

Part of me wonders if we DO want to just capture everything exiftool can capture, although maybe not display it all in admin UI (or maybe do so!)

See also #2288

apinkney0696 commented 1 year ago

Interesting... This is an old one so I can't speak to why it would have been photoshopped. It's not that large of an object so it shouldn't have needed to be spliced together

I've not dealt with Alpha Channels before. I wonder how many assets have this in the DC, and whether I would be able to retroactively remove the channel in Photoshop. I can look into it if it would be useful to you.

Perhaps we should capture all of the Exif metadata. It's hard to predict things like this!

jrochkind commented 1 year ago

@eddierubeiz I'm thinking we actually run exiftool on all ingests, and put it's JSON output in a field in our json_attributes.

Then we display a few fields on the asset Admin web page, as specified here.

Does put a lot more stuff in our json column for assets... I think that's fine? (We may want to configure the json column to no longer be automatically included when printing out an Asset in the console or log, it's already getting a bit of out of control).

or do we make a new column in kithe_models, just for this?

jrochkind commented 1 year ago

Annoyingly, including exiftool in our Aptfile did not seem to be enough to get it on heroku.

~ $ exiftool -ver
Can't locate Image/ExifTool.pm in @INC (you may need to install the Image::ExifTool module) (@INC contains: /app/.apt/usr/bin/lib /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.34.0 /usr/local/share/perl/5.34.0 /usr/lib/x86_64-linux-gnu/perl5/5.34 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl-base /usr/lib/x86_64-linux-gnu/perl/5.34 /usr/share/perl/5.34 /usr/local/lib/site_perl) at /app/.apt/usr/bin/exiftool line 39.
BEGIN failed--compilation aborted at /app/.apt/usr/bin/exiftool line 39.

Here's a heroku buildpack that hasn't been touched in 3 years for installing exiftool, that maybe uses an old version:

https://github.com/velizarn/heroku-buildpack-exiftool

Still researching more.

jrochkind commented 1 year ago

@apinkney0696 @eddierubeiz

All assets in production now have exiftool results stored, and selected metadata displayed on Asset admin page.

Feedback on display is welcome. Currently I am also displaying all exiftool-reported "validation warnings", although most of these seem ignorable -- most of them seem to be triggered apparently by any Photoshop exported TIFF, photoshop just exports missing some things that exiftool thinks should be there.

I thought the "validation warnings" might be helpful for troubleshooting and full technical transparency nonetheless, but if displaying them like this is distracting, I can remove them, or hide them behind a toggleable disclosure, or other.

eg

https://digital.sciencehistory.org/admin/asset_files/2mxlez3

https://digital.sciencehistory.org/admin/asset_files/ewjgots

https://digital.sciencehistory.org/admin/asset_files/hmf5hi9

apinkney0696 commented 1 year ago

Looks great! Let's leave out the validation warnings.