strukturag / libheif

libheif is an HEIF and AVIF file format decoder and encoder.
Other
1.6k stars 292 forks source link

How can I retrieve additional metadata information from the Spatial HEIC #1164

Open bigcat88 opened 2 months ago

bigcat88 commented 2 months ago

Original issue with file example is here:

https://github.com/bigcat88/pillow_heif/issues/234

Is there a way to get this information from an image and save the file so it contains it?

image

I tried to find where it is stored using heif-info -d but totally get lost in number of different boxes those file contains..

I would be grateful for any help.

farindk commented 2 months ago

There are four properties in the file that might hold this information:

| | | Box: 4363e914-5b7d-4aab-97ae-bea6983b434 -----
| | | size: 28   (header size: 24)
| | | 
| | | Box: 22cc4c7-d6d9-4e7-9d90-4eb6ecbaf3a3 -----
| | | size: 40   (header size: 24)
...
| | | Box: 4363e914-5b7d-4aab-97ae-bea6983b434 -----
| | | size: 32   (header size: 24)
| | | 
| | | Box: de225085-36cb-4365-8743-2f875e7c78a -----
| | | size: 28   (header size: 24)

But we have no specification of those. They are proprietary. We would need a couple of images with known intrinsic parameters (covering several different values) to be able to reverse engineer their content.

jwheeler-work commented 2 months ago

There should be additional metadata for each picture. Using the Image I/O framework, they'd be defined like this:

       let properties = [
            kCGImagePropertyGroups: [
                kCGImagePropertyGroupIndex: 0,
                kCGImagePropertyGroupType: kCGImagePropertyGroupTypeStereoPair,
                kCGImagePropertyGroupImageIndexLeft: 0,
                kCGImagePropertyGroupImageIndexRight: 1,
            ],
            kCGImagePropertyHEIFDictionary: [
                kIIOMetadata_CameraModelKey: [
                    kIIOCameraModel_Intrinsics: cameraIntrinsics as CFArray
                ]
            ]
        ]
jwheeler-work commented 2 months ago

IMG_0050.zip Does this help? 3 images from the Apple Vision Pro.

farindk commented 2 months ago

Thanks. Could you please also send me the decoded metadata values that are stored in there? I don't have a Mac to read them out.

jwheeler-work commented 2 months ago

IMG_0049 IMG_0050 IMG_0051

This good?

bradh commented 2 months ago

Looks like its the same in each case. Do they ever vary?

jwheeler-work commented 2 months ago

They can. In this case, because they were all shots on the Vision Pro in the same area the values are the same.

I have panoramic photos that will have different intrinsics, but the format is the same.

I suspect there's additional tags that Apple is using to determine this is a stereo pair. That's the code that I posted before.

bradh commented 2 months ago

So to work out which values correspond with which bytes (or bits) in those UUID fields, we need to see the variations. Ideally one parameter change at a time would vary a small amount of the values.

jwheeler-work commented 2 months ago

Hmm... I don't think I can provide that. The best I can do is more samples of photos that work on the Vision Pro by either taking photos with an iPhone or the AVP.

JoanCharmant commented 2 months ago

Hi, (I posted the report in the other repo)

Here is a set of 3 files focused on the "Camera model" field with the intrinsics matrix.

intrinsics.zip

Example:

heic-intrinsics-1

They are based on a similar code snippet as posted above with only variations in the kIIOCameraModel_Intrinsics field. It’s not possible to create the files with all zeros or just changing one value as the encoder tests that the matrix is valid.

Files and corresponding kIIOCameraModel_Intrinsics value:

Note: I only used integers but this is an array of floats.

JoanCharmant commented 2 months ago

And here is a set of files focusing on the camera extrinsics key.

exitrinsics.zip

Example:

camera-extrinsics

Files:

bradh commented 2 months ago

Intrinsics analysis

File 0 - two UUID properties, and each of the two items are associated with both.

Intrinsics matrix [1, 0, 0, 0, 1, 0, 0, 0, 1] uuid: 22cc04c7d6d94e079d904eb6ecbaf3a3 value: 00001e00 0010624e 00000000 00000000

uuid: de22508536cb436587432f8705e7c78a value: 00000000

File 1 - same two UUID properties, same association

Intrinsics matrix [2, 0, 2, 0, 2, 2, 0, 0, 1] uuid: 22cc04c7d6d94e079d904eb6ecbaf3a3 value: 00001e00 0020c49c 0020c49c 0020c49c

uuid: de22508536cb436587432f8705e7c78a value: 00000000

File 2 - same two UUID properties, same association

Intrinsics matrix [100, 0, 100, 0, 100, 100, 0, 0, 1] uuid: 22cc04c7d6d94e079d904eb6ecbaf3a3 value: 00001e00 06666666 06666666 06666666

uuid: de22508536cb436587432f8705e7c78a value: 00000000

Assume the 22cc04c7d6d94e079d904eb6ecbaf3a3 is the identifier for the intrinsics

So we have 00001e00 0010624e 00000000 00000000 for [1, 0, 0, 0, 1, 0, 0, 0, 1] 00001e00 0020c49c 0020c49c 0020c49c for [2, 0, 2, 0, 2, 2, 0, 0, 1] 00001e00 06666666 06666666 06666666 for [100, 0, 100, 0, 100, 100, 0, 0, 1]

Its not clear to me how the 9 values could fit into 16 bytes unless there is some kind of encoding, possibly omitting some values that are defined as 0 (e.g. 4th, 7th and 8th values).

Possibly the 0x1e relates to a signature or encoded length (0x1e = 30, the number of bytes is 16).

farindk commented 2 months ago

A general intrinsic matrix usually looks like this:

f s x
0 f y
0 0 1

One can also assume that the skew s = 0. That would leave us with just the three parameters f, x, y.

It is also nice to see that the encoding of 2=0x20c49c is exactly two times 1=0x10624e. And if we divide 0x06666666 / 0x10624e, we also get decimal 100 (almost). Seems to fit nicely.

Thus, first four bytes unknown, maybe some flags (e.g. for "ModelType = Simplified Pinhole") Second four bytes: f Third / fourth four bytes: x/y, but we need more data to differentiate that.

bradh commented 1 month ago

Extrinsics data extraction

Each file has two uuid properties (boxes). Both images in each file are associated with both uuid properties.

One is uuid: de22508536cb436587432f8705e7c78a value: 00000000 as described above.

The other is more interesting. It is uuid: 4363e9145b7d4aab97aebea69803b434 and the property value changes byte values and length. See below

5-Extrinsics

value: 00000010 CoordinateSystemID: 0, Position: [0, 0, 0], Rotation: [1, 0, 0, 0, 1, 0, 0, 0, 1].

6-Extrinsics

value: 00000011 000f4240 CoordinateSystemID: 0, Position: [1, 0, 0], Rotation: [1, 0, 0, 0, 1, 0, 0, 0, 1].

7-Extrinsics

value: 00000012 000f4240 CoordinateSystemID: 0, Position: [0, 1, 0], Rotation: [1, 0, 0, 0, 1, 0, 0, 0, 1].

8-Extrinsics

value: 00000014 000f4240 CoordinateSystemID: 0, Position: [0, 0, 1], Rotation: [1, 0, 0, 0, 1, 0, 0, 0, 1].

9-Extrinsics

value: 00000010 CoordinateSystemID: 0, Position: [0, 0, 0], Rotation: [1, 0, 0, 0, 1, 0, 0, 0, 1].

10-Extrinsics

value: 00000010 CoordinateSystemID: 1, Position: [0, 0, 0], Rotation: [1, 0, 0, 0, 1, 0, 0, 0, 1].

File 10 also has an extra uuid box: Its the same as the assumed intrinsics box above. value: 00001e00 0010624e 00000000 00000000

So it looks like coordinate system id may be by position (not coded).

byte [7] is clearly changing as we step through the position changes.

000f4240 is 3.03216553 as a little endian float. Can't make that fit particularly well though.

TimYao18 commented 1 month ago

also, if the photo is a spatial photo, there will be a 'grpl' box under 'meta' box where the 'grpl' Grouplistbox box will contain a 'ster' box that is Stereoscopic pair.

TimYao18 commented 1 month ago

test_images.zip

I have attached two identical images, but one of them, spatial.HEIC, is a Spatial Photo that add the key metadata in it. I think this can be easily compared. I want to fix this issue by myself but I'm not good in C++ that I don't know where to start.

bradh commented 1 month ago

I have attached two identical images, but one of them, spatial.HEIC, is a Spatial Photo that add the key metadata in it.

Can you show the associated key metadata (i.e. as apple displays it)?

TimYao18 commented 1 month ago

There should be additional metadata for each picture. Using the Image I/O framework, they'd be defined like this:

       let properties = [
            kCGImagePropertyGroups: [
                kCGImagePropertyGroupIndex: 0,
                kCGImagePropertyGroupType: kCGImagePropertyGroupTypeStereoPair,
                kCGImagePropertyGroupImageIndexLeft: 0,
                kCGImagePropertyGroupImageIndexRight: 1,
            ],
            kCGImagePropertyHEIFDictionary: [
                kIIOMetadata_CameraModelKey: [
                    kIIOCameraModel_Intrinsics: cameraIntrinsics as CFArray
                ]
            ]
        ]

What I do to these 2 images is one image generated with above code, another without the code. So we can compare their file structure using heif-info.exe -d to these 2 images. Or we can use some isobmff tool like pyisobmff.

I attached two file that generated by pyisobmff: pyisobmff_decode.zip

Just use the text comparing tools to check the difference. This is currently I can do so far. Also, if you know the box specific in the spatial images, you might use hex editor to search the brand and see what value it has.

Below are screenshots that spatial image contained more than non_spatial image:

"uuid"

Screenshot 2024-05-17 194953

"uuid2"

Screenshot 2024-05-17 195011

"grpl" that it contains 'ster' box inside of it but pyisobmff cannot decode it.

Screenshot 2024-05-17 195029

Can you show the associated key metadata (i.e. as apple displays it)?

What the difference is the image @jwheeler and @JoanCharmant posted, the preview app in macOS will plus a tag "HEIC" that non-spatial image doesn't have.

TimYao18 commented 1 month ago

Hi all, I found the UUID also related to the image resolution. If I changed the image resolution, the value will change, too.

farindk commented 2 weeks ago

Reading and writing of the camera intrinsic matrix should be working now in branch develop-v1.18.0. Extrinsic matrix will follow shortly.

farindk commented 2 weeks ago

Is there some test data for the extrinsic camera matrix? Especially with camera orientation once specified as a quaternion and once with rotation angles?

bradh commented 2 weeks ago

Assuming its the same as cmin and cmex, there are test examples at

https://github.com/MPEGGroup/FileFormatConformance/pull/86

and

https://github.com/MPEGGroup/FileFormatConformance/pull/85

farindk commented 2 weeks ago

Assuming its the same as cmin and cmex, there are test examples at

Thank you. That helped to confirm that intrinsics and quaternion-based rotation are read correctly. I have also chosen the rotation sequence order to match the output described in the rotation.txt file at that repository.