guardian / grid

The Guardian’s image management system
https://www.theguardian.com/info/developer-blog/2015/aug/12/open-sourcing-grid-image-service
Apache License 2.0
1.44k stars 121 forks source link

Support XMP and metadata reconcilliation #1117

Closed paperboyo closed 6 years ago

paperboyo commented 9 years ago

[EDIT: Thanks for reopening. This issue is way too long and first comments just document me learning about what's wrong, through a specific case (still occurs). It gets more general in https://github.com/guardian/grid/issues/1117#issuecomment-138691921 where I realise we are not really reading XMP metadata (maybe, treat this as a first comment). At the end there are some examples of how more robust could we be. I think, would we ever want to fix it, we should take a look at why metadata-extractor omits loads of XMP and then decide if we want to do reconciliation ourselves or should we, maybe, consider contributing to metadata-extractor directly. Or use Adobe's XMPToolkit. I may be wrong, as always!]

Hello,

Happened with just one image, but I suspect it has something to do with the new usage rights categories (never happened to me before). If not, and if it's an outlier - apologies and, please, close.

Steps:

  1. Download 4fb55b5c7d85075e91d3e6727024635b374bffbf for which meta reads "By: Christopher Thomond, Credit: The Guardian"
  2. Modify, so that the image is not recognised as existing (I've opened in PS, colour corrected, saved as JPG, with "_modified" added to the filename)
  3. Upload to Grid.

Here is a screenshot of resulting image's metadata: usage_rights-reupload

Ideally, all metadata should survive such a ride.

Regards Mateusz

akash1810 commented 9 years ago

Can't reproduce on TEST

screen shot 2015-09-07 at 22 02 03
paperboyo commented 9 years ago

I can reproduce in both environments (the same image). I open it in Photoshop (CS6 Mac & 2014.2.2 PC), modify and resave. It MAY BE that there is something wrong with this specific image metadata that hoses it by open/save in Ps. Or, something may be wrong with Ps metadata engine (unlikely). [see below]

[EDIT: ignore the below dumps, look at https://github.com/guardian/grid/issues/1117#issuecomment-143597147 further below in the thread, please] Here is a Ps metadata dump from the original image:

<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.6-c014 79.156797, 2014/08/20-09:53:02        ">
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
      <rdf:Description rdf:about=""
            xmlns:photoshop="http://ns.adobe.com/photoshop/1.0/"
            xmlns:aux="http://ns.adobe.com/exif/1.0/aux/"
            xmlns:xmpRights="http://ns.adobe.com/xap/1.0/rights/"
            xmlns:photomechanic="http://ns.camerabits.com/photomechanic/1.0/"
            xmlns:xmp="http://ns.adobe.com/xap/1.0/"
            xmlns:Iptc4xmpCore="http://iptc.org/std/Iptc4xmpCore/1.0/xmlns/"
            xmlns:dc="http://purl.org/dc/elements/1.1/"
            xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/"
            xmlns:tiff="http://ns.adobe.com/tiff/1.0/"
            xmlns:exif="http://ns.adobe.com/exif/1.0/">
         <photoshop:LegacyIPTCDigest>FF66AA1841316F24F02AA0C75219D284</photoshop:LegacyIPTCDigest>
         <photoshop:Country>UK</photoshop:Country>
         <photoshop:Credit>Christopher Thomond</photoshop:Credit>
         <photoshop:DateCreated>2015-08-18T00:00:31+01:00</photoshop:DateCreated>
         <photoshop:ColorMode>3</photoshop:ColorMode>
         <photoshop:ICCProfile>Adobe RGB (1998)</photoshop:ICCProfile>
         <aux:LensInfo>70/1 200/1 0/0 0/0</aux:LensInfo>
         <aux:Lens>EF70-200mm f/2.8L IS II USM</aux:Lens>
         <aux:FlashCompensation>0/1</aux:FlashCompensation>
         <aux:Firmware>Firmware Version 1.2.3</aux:Firmware>
         <aux:ImageNumber>1350</aux:ImageNumber>
         <aux:ApproximateFocusDistance>0/1</aux:ApproximateFocusDistance>
         <xmpRights:Marked>True</xmpRights:Marked>
         <photomechanic:ColorClass>0</photomechanic:ColorClass>
         <photomechanic:Tagged>True</photomechanic:Tagged>
         <photomechanic:Prefs>1:0:0:001350</photomechanic:Prefs>
         <photomechanic:PMVersion>PM5</photomechanic:PMVersion>
         <xmp:Rating>0</xmp:Rating>
         <xmp:CreatorTool>Capture One 8 Macintosh</xmp:CreatorTool>
         <xmp:ModifyDate>2000-01-01T00:00:31</xmp:ModifyDate>
         <xmp:CreateDate>2000-01-01T00:00:31</xmp:CreateDate>
         <xmp:MetadataDate>2000-01-01T00:00:31</xmp:MetadataDate>
         <Iptc4xmpCore:CreatorContactInfo rdf:parseType="Resource">
            <Iptc4xmpCore:CiTelWork>07836745201</Iptc4xmpCore:CiTelWork>
         </Iptc4xmpCore:CreatorContactInfo>
         <dc:creator>
            <rdf:Seq>
               <rdf:li>Christopher Thomond for The Guardian.</rdf:li>
            </rdf:Seq>
         </dc:creator>
         <dc:rights>
            <rdf:Alt>
               <rdf:li xml:lang="x-default">© Christopher Thomond</rdf:li>
            </rdf:Alt>
         </dc:rights>
         <dc:description>
            <rdf:Alt>
               <rdf:li xml:lang="x-default">Thomond.  MIDDLESBOROUGH 18th August 2015.   Labour party  leadership candidate Jeremy Corbyn addressing a rally at Middlesbrough Town Hall, Teesside.&#xA;&#xA;</rdf:li>
            </rdf:Alt>
         </dc:description>
         <dc:format>image/jpeg</dc:format>
         <xmpMM:DocumentID>11330C9CF637AAA4F2A4C6A79BB85654</xmpMM:DocumentID>
         <xmpMM:InstanceID>11330C9CF637AAA4F2A4C6A79BB85654</xmpMM:InstanceID>
         <tiff:ImageWidth>5644</tiff:ImageWidth>
         <tiff:ImageLength>3763</tiff:ImageLength>
         <tiff:BitsPerSample>
            <rdf:Seq>
               <rdf:li>8</rdf:li>
               <rdf:li>8</rdf:li>
               <rdf:li>8</rdf:li>
            </rdf:Seq>
         </tiff:BitsPerSample>
         <tiff:PhotometricInterpretation>2</tiff:PhotometricInterpretation>
         <tiff:SamplesPerPixel>3</tiff:SamplesPerPixel>
         <tiff:XResolution>300/1</tiff:XResolution>
         <tiff:YResolution>300/1</tiff:YResolution>
         <tiff:ResolutionUnit>2</tiff:ResolutionUnit>
         <tiff:Make>Canon</tiff:Make>
         <tiff:Model>Canon EOS 5D Mark III</tiff:Model>
         <exif:ExifVersion>0220</exif:ExifVersion>
         <exif:ColorSpace>65535</exif:ColorSpace>
         <exif:PixelXDimension>5644</exif:PixelXDimension>
         <exif:PixelYDimension>3763</exif:PixelYDimension>
         <exif:DateTimeOriginal>2000-01-01T00:00:31</exif:DateTimeOriginal>
         <exif:ExposureTime>1/160</exif:ExposureTime>
         <exif:FNumber>32/10</exif:FNumber>
         <exif:ExposureProgram>4</exif:ExposureProgram>
         <exif:ISOSpeedRatings>
            <rdf:Seq>
               <rdf:li>1600</rdf:li>
            </rdf:Seq>
         </exif:ISOSpeedRatings>
         <exif:ShutterSpeedValue>483328/65536</exif:ShutterSpeedValue>
         <exif:ApertureValue>3356144/1000000</exif:ApertureValue>
         <exif:ExposureBiasValue>-2/3</exif:ExposureBiasValue>
         <exif:SubjectDistance>0/1</exif:SubjectDistance>
         <exif:MeteringMode>5</exif:MeteringMode>
         <exif:Flash rdf:parseType="Resource">
            <exif:Fired>False</exif:Fired>
            <exif:Return>0</exif:Return>
            <exif:Mode>2</exif:Mode>
            <exif:Function>False</exif:Function>
            <exif:RedEyeMode>False</exif:RedEyeMode>
         </exif:Flash>
         <exif:FocalLength>200/1</exif:FocalLength>
         <exif:FocalPlaneResolutionUnit>2</exif:FocalPlaneResolutionUnit>
         <exif:FileSource>3</exif:FileSource>
         <exif:SceneType>1</exif:SceneType>
         <exif:ExposureMode>0</exif:ExposureMode>
         <exif:WhiteBalance>1</exif:WhiteBalance>
      </rdf:Description>
   </rdf:RDF>
</x:xmpmeta>

Here is a dump of modified image metadata (after a PS resave):

<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.6-c014 79.156797, 2014/08/20-09:53:02        ">
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
      <rdf:Description rdf:about=""
            xmlns:photoshop="http://ns.adobe.com/photoshop/1.0/"
            xmlns:aux="http://ns.adobe.com/exif/1.0/aux/"
            xmlns:xmpRights="http://ns.adobe.com/xap/1.0/rights/"
            xmlns:photomechanic="http://ns.camerabits.com/photomechanic/1.0/"
            xmlns:xmp="http://ns.adobe.com/xap/1.0/"
            xmlns:dc="http://purl.org/dc/elements/1.1/"
            xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/"
            xmlns:stEvt="http://ns.adobe.com/xap/1.0/sType/ResourceEvent#"
            xmlns:Iptc4xmpCore="http://iptc.org/std/Iptc4xmpCore/1.0/xmlns/"
            xmlns:tiff="http://ns.adobe.com/tiff/1.0/"
            xmlns:exif="http://ns.adobe.com/exif/1.0/">
         <photoshop:LegacyIPTCDigest>E4BB4DD2636F85E307BE62148C3CD2DF</photoshop:LegacyIPTCDigest>
         <photoshop:Country>UK</photoshop:Country>
         <photoshop:Credit>Christopher Thomond</photoshop:Credit>
         <photoshop:DateCreated>2015-08-18T00:00:31+01:00</photoshop:DateCreated>
         <photoshop:ColorMode>3</photoshop:ColorMode>
         <photoshop:ICCProfile>Adobe RGB (1998)</photoshop:ICCProfile>
         <aux:LensInfo>70/1 200/1 0/0 0/0</aux:LensInfo>
         <aux:Lens>EF70-200mm f/2.8L IS II USM</aux:Lens>
         <aux:FlashCompensation>0/1</aux:FlashCompensation>
         <aux:Firmware>Firmware Version 1.2.3</aux:Firmware>
         <aux:ImageNumber>1350</aux:ImageNumber>
         <aux:ApproximateFocusDistance>0/1</aux:ApproximateFocusDistance>
         <xmpRights:Marked>True</xmpRights:Marked>
         <photomechanic:ColorClass>0</photomechanic:ColorClass>
         <photomechanic:Tagged>True</photomechanic:Tagged>
         <photomechanic:Prefs>1:0:0:001350</photomechanic:Prefs>
         <photomechanic:PMVersion>PM5</photomechanic:PMVersion>
         <xmp:Rating>0</xmp:Rating>
         <xmp:CreatorTool>Adobe Photoshop CC 2014 (Windows)</xmp:CreatorTool>
         <xmp:ModifyDate>2015-09-07T22:13:09+01:00</xmp:ModifyDate>
         <xmp:CreateDate>2000-01-01T00:00:31</xmp:CreateDate>
         <xmp:MetadataDate>2015-09-07T22:13:09+01:00</xmp:MetadataDate>
         <dc:format>image/jpeg</dc:format>
         <dc:creator>
            <rdf:Seq>
               <rdf:li>Christopher Thomond for The Guardian.</rdf:li>
            </rdf:Seq>
         </dc:creator>
         <dc:rights>
            <rdf:Alt>
               <rdf:li xml:lang="x-default">© Christopher Thomond</rdf:li>
            </rdf:Alt>
         </dc:rights>
         <dc:description>
            <rdf:Alt>
               <rdf:li xml:lang="x-default">Thomond.  MIDDLESBOROUGH 18th August 2015.   Labour party  leadership candidate Jeremy Corbyn addressing a rally at Middlesbrough Town Hall, Teesside.&#xA;&#xA;</rdf:li>
            </rdf:Alt>
         </dc:description>
         <xmpMM:DocumentID>adobe:docid:photoshop:3800c038-55a5-11e5-94e9-dadcbc31373f</xmpMM:DocumentID>
         <xmpMM:InstanceID>xmp.iid:d0f5c4d5-8566-0e47-9823-f0ac981e309d</xmpMM:InstanceID>
         <xmpMM:OriginalDocumentID>11330C9CF637AAA4F2A4C6A79BB85654</xmpMM:OriginalDocumentID>
         <xmpMM:History>
            <rdf:Seq>
               <rdf:li rdf:parseType="Resource">
                  <stEvt:action>saved</stEvt:action>
                  <stEvt:instanceID>xmp.iid:d786799d-17eb-474a-8f81-7c08bc1a1e66</stEvt:instanceID>
                  <stEvt:when>2015-09-07T22:13:09+01:00</stEvt:when>
                  <stEvt:softwareAgent>Adobe Photoshop CC 2014 (Windows)</stEvt:softwareAgent>
                  <stEvt:changed>/</stEvt:changed>
               </rdf:li>
               <rdf:li rdf:parseType="Resource">
                  <stEvt:action>saved</stEvt:action>
                  <stEvt:instanceID>xmp.iid:d0f5c4d5-8566-0e47-9823-f0ac981e309d</stEvt:instanceID>
                  <stEvt:when>2015-09-07T22:13:09+01:00</stEvt:when>
                  <stEvt:softwareAgent>Adobe Photoshop CC 2014 (Windows)</stEvt:softwareAgent>
                  <stEvt:changed>/</stEvt:changed>
               </rdf:li>
            </rdf:Seq>
         </xmpMM:History>
         <Iptc4xmpCore:CreatorContactInfo rdf:parseType="Resource">
            <Iptc4xmpCore:CiTelWork>07836745201</Iptc4xmpCore:CiTelWork>
         </Iptc4xmpCore:CreatorContactInfo>
         <tiff:ImageWidth>5644</tiff:ImageWidth>
         <tiff:ImageLength>3763</tiff:ImageLength>
         <tiff:BitsPerSample>
            <rdf:Seq>
               <rdf:li>8</rdf:li>
               <rdf:li>8</rdf:li>
               <rdf:li>8</rdf:li>
            </rdf:Seq>
         </tiff:BitsPerSample>
         <tiff:PhotometricInterpretation>2</tiff:PhotometricInterpretation>
         <tiff:Orientation>1</tiff:Orientation>
         <tiff:SamplesPerPixel>3</tiff:SamplesPerPixel>
         <tiff:XResolution>3000000/10000</tiff:XResolution>
         <tiff:YResolution>3000000/10000</tiff:YResolution>
         <tiff:ResolutionUnit>2</tiff:ResolutionUnit>
         <tiff:Make>Canon</tiff:Make>
         <tiff:Model>Canon EOS 5D Mark III</tiff:Model>
         <exif:ExifVersion>0220</exif:ExifVersion>
         <exif:ColorSpace>65535</exif:ColorSpace>
         <exif:PixelXDimension>5644</exif:PixelXDimension>
         <exif:PixelYDimension>3763</exif:PixelYDimension>
         <exif:DateTimeOriginal>2000-01-01T00:00:31</exif:DateTimeOriginal>
         <exif:ExposureTime>1/160</exif:ExposureTime>
         <exif:FNumber>32/10</exif:FNumber>
         <exif:ExposureProgram>4</exif:ExposureProgram>
         <exif:ISOSpeedRatings>
            <rdf:Seq>
               <rdf:li>1600</rdf:li>
            </rdf:Seq>
         </exif:ISOSpeedRatings>
         <exif:ShutterSpeedValue>483328/65536</exif:ShutterSpeedValue>
         <exif:ApertureValue>3356144/1000000</exif:ApertureValue>
         <exif:ExposureBiasValue>-2/3</exif:ExposureBiasValue>
         <exif:SubjectDistance>0/1</exif:SubjectDistance>
         <exif:MeteringMode>5</exif:MeteringMode>
         <exif:Flash rdf:parseType="Resource">
            <exif:Fired>False</exif:Fired>
            <exif:Return>0</exif:Return>
            <exif:Mode>2</exif:Mode>
            <exif:Function>False</exif:Function>
            <exif:RedEyeMode>False</exif:RedEyeMode>
         </exif:Flash>
         <exif:FocalLength>200/1</exif:FocalLength>
         <exif:FocalPlaneResolutionUnit>2</exif:FocalPlaneResolutionUnit>
         <exif:FileSource>3</exif:FileSource>
         <exif:SceneType>1</exif:SceneType>
         <exif:ExposureMode>0</exif:ExposureMode>
         <exif:WhiteBalance>1</exif:WhiteBalance>
      </rdf:Description>
   </rdf:RDF>
</x:xmpmeta>

Not sure what I should do. I will be on a lookout for more examples like that. If it won’t happen again, I will close the issue.

Regards Mateusz

paperboyo commented 9 years ago

MS Paint-resaved image does not exhibit the above behaviour. Most unfortunately, we do not use MS Paint for image modification :stuck_out_tongue_winking_eye:

paperboyo commented 9 years ago

Another example (also contract) cf219dc2d9ff5fbd8ff79e1ca0f1690501ac6c72.

Reuploaded to test: 27038707730288250c0d3ccdaa766edfa1fe0e57

Original byline: Christopher Thomond Reuploaded byline: Christopher Thomond/The Guardian

Credit reads Christopher Thomond in both instances.

akash1810 commented 9 years ago

MS Paint-resaved image does not exhibit the above behaviour.

I used Preview, which is probably the same...

jamesgorrie commented 9 years ago

The problem here is that we don't rewrite metadata to the image, it is stored independently in a datastore.

We could potentially look at rewriting the metadata to the image on download as I would be a bit worried to manipulate the original image, but we would have to understand that re-uploading it wuld create another ID.

paperboyo commented 9 years ago

we don't rewrite metadata to the image

Yeah, I know, that would be useful, but it's a separate issue, I wouldn't open this just for that - it would be a duplicate of https://github.com/guardian/grid/issues/823.

I think it may be different to above because:

We could potentially look at rewriting the metadata to the image on download

That would be helpful and a good time to write back to image, IMHO (bar someone selecting 3000 and downloading them in one go :wink:)

but we would have to understand that re-uploading it wuld create another ID

Would that be really necessary? If the written bit contained the pointer to the original and the reuploaded image would be all but identical to the original in everything but the metadata written to it on download (which we could tell, cause we have written it)?

In general - no biggie, but interesting. If we are super-sure that the metadata ingestion is correct (the above examples may suggest that it could be improved), then please close.

theefer commented 9 years ago

Here is the diff of your two outputs:

--- a   2015-09-08 12:58:57.878028037 +0100
+++ b   2015-09-08 12:59:04.557624738 +0100
@@ -6,12 +6,13 @@
             xmlns:xmpRights="http://ns.adobe.com/xap/1.0/rights/"
             xmlns:photomechanic="http://ns.camerabits.com/photomechanic/1.0/"
             xmlns:xmp="http://ns.adobe.com/xap/1.0/"
-            xmlns:Iptc4xmpCore="http://iptc.org/std/Iptc4xmpCore/1.0/xmlns/"
             xmlns:dc="http://purl.org/dc/elements/1.1/"
             xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/"
+            xmlns:stEvt="http://ns.adobe.com/xap/1.0/sType/ResourceEvent#"
+            xmlns:Iptc4xmpCore="http://iptc.org/std/Iptc4xmpCore/1.0/xmlns/"
             xmlns:tiff="http://ns.adobe.com/tiff/1.0/"
             xmlns:exif="http://ns.adobe.com/exif/1.0/">
-         <photoshop:LegacyIPTCDigest>FF66AA1841316F24F02AA0C75219D284</photoshop:LegacyIPTCDigest>
+         <photoshop:LegacyIPTCDigest>E4BB4DD2636F85E307BE62148C3CD2DF</photoshop:LegacyIPTCDigest>
          <photoshop:Country>UK</photoshop:Country>
          <photoshop:Credit>Christopher Thomond</photoshop:Credit>
          <photoshop:DateCreated>2015-08-18T00:00:31+01:00</photoshop:DateCreated>
@@ -29,13 +30,11 @@
          <photomechanic:Prefs>1:0:0:001350</photomechanic:Prefs>
          <photomechanic:PMVersion>PM5</photomechanic:PMVersion>
          <xmp:Rating>0</xmp:Rating>
-         <xmp:CreatorTool>Capture One 8 Macintosh</xmp:CreatorTool>
-         <xmp:ModifyDate>2000-01-01T00:00:31</xmp:ModifyDate>
+         <xmp:CreatorTool>Adobe Photoshop CC 2014 (Windows)</xmp:CreatorTool>
+         <xmp:ModifyDate>2015-09-07T22:13:09+01:00</xmp:ModifyDate>
          <xmp:CreateDate>2000-01-01T00:00:31</xmp:CreateDate>
-         <xmp:MetadataDate>2000-01-01T00:00:31</xmp:MetadataDate>
-         <Iptc4xmpCore:CreatorContactInfo rdf:parseType="Resource">
-            <Iptc4xmpCore:CiTelWork>07836745201</Iptc4xmpCore:CiTelWork>
-         </Iptc4xmpCore:CreatorContactInfo>
+         <xmp:MetadataDate>2015-09-07T22:13:09+01:00</xmp:MetadataDate>
+         <dc:format>image/jpeg</dc:format>
          <dc:creator>
             <rdf:Seq>
                <rdf:li>Christopher Thomond for The Guardian.</rdf:li>
@@ -51,9 +50,30 @@
                <rdf:li xml:lang="x-default">Thomond.  MIDDLESBOROUGH 18th August 2015.   Labour party  leadership candidate Jeremy Corbyn addressing a rally at Middlesbrough Town Hall, Teesside.&#xA;&#xA;</rdf:li>
             </rdf:Alt>
          </dc:description>
-         <dc:format>image/jpeg</dc:format>
-         <xmpMM:DocumentID>11330C9CF637AAA4F2A4C6A79BB85654</xmpMM:DocumentID>
-         <xmpMM:InstanceID>11330C9CF637AAA4F2A4C6A79BB85654</xmpMM:InstanceID>
+         <xmpMM:DocumentID>adobe:docid:photoshop:3800c038-55a5-11e5-94e9-dadcbc31373f</xmpMM:DocumentID>
+         <xmpMM:InstanceID>xmp.iid:d0f5c4d5-8566-0e47-9823-f0ac981e309d</xmpMM:InstanceID>
+         <xmpMM:OriginalDocumentID>11330C9CF637AAA4F2A4C6A79BB85654</xmpMM:OriginalDocumentID>
+         <xmpMM:History>
+            <rdf:Seq>
+               <rdf:li rdf:parseType="Resource">
+                  <stEvt:action>saved</stEvt:action>
+                  <stEvt:instanceID>xmp.iid:d786799d-17eb-474a-8f81-7c08bc1a1e66</stEvt:instanceID>
+                  <stEvt:when>2015-09-07T22:13:09+01:00</stEvt:when>
+                  <stEvt:softwareAgent>Adobe Photoshop CC 2014 (Windows)</stEvt:softwareAgent>
+                  <stEvt:changed>/</stEvt:changed>
+               </rdf:li>
+               <rdf:li rdf:parseType="Resource">
+                  <stEvt:action>saved</stEvt:action>
+                  <stEvt:instanceID>xmp.iid:d0f5c4d5-8566-0e47-9823-f0ac981e309d</stEvt:instanceID>
+                  <stEvt:when>2015-09-07T22:13:09+01:00</stEvt:when>
+                  <stEvt:softwareAgent>Adobe Photoshop CC 2014 (Windows)</stEvt:softwareAgent>
+                  <stEvt:changed>/</stEvt:changed>
+               </rdf:li>
+            </rdf:Seq>
+         </xmpMM:History>
+         <Iptc4xmpCore:CreatorContactInfo rdf:parseType="Resource">
+            <Iptc4xmpCore:CiTelWork>07836745201</Iptc4xmpCore:CiTelWork>
+         </Iptc4xmpCore:CreatorContactInfo>
          <tiff:ImageWidth>5644</tiff:ImageWidth>
          <tiff:ImageLength>3763</tiff:ImageLength>
          <tiff:BitsPerSample>
@@ -64,9 +84,10 @@
             </rdf:Seq>
          </tiff:BitsPerSample>
          <tiff:PhotometricInterpretation>2</tiff:PhotometricInterpretation>
+         <tiff:Orientation>1</tiff:Orientation>
          <tiff:SamplesPerPixel>3</tiff:SamplesPerPixel>
-         <tiff:XResolution>300/1</tiff:XResolution>
-         <tiff:YResolution>300/1</tiff:YResolution>
+         <tiff:XResolution>3000000/10000</tiff:XResolution>
+         <tiff:YResolution>3000000/10000</tiff:YResolution>
          <tiff:ResolutionUnit>2</tiff:ResolutionUnit>
          <tiff:Make>Canon</tiff:Make>
          <tiff:Model>Canon EOS 5D Mark III</tiff:Model>
theefer commented 9 years ago

Some changes but none that would affect the byline or other detection AFAICS?

I wonder if all you're seeing is us having changed the usage rights and metadata processors?

paperboyo commented 9 years ago

I wonder if all you're seeing is us having changed the usage rights and metadata processors

That well may be. But why the same image produces different credit/byline info (and makes image PAID)? Of course the image is not EXACTLY the same (the above diff is for the metadata as interpreted by Photoshop in both instances - but it does not contain any changes in anything related to credit or byline...). Maybe we are using different processors for different sources (agencies and manual upload in this instance).

Anyway, from my perspective, the outcome is not correct. But I don't understand what caused it. So, if what caused it is working correctly/is expected/can't be changed easily/if changed would brake something more important, I will just live with that. And try to pay attention always to something I was always happily oblivious to - the metadata of a reuploaded image. And correct it when necessary.

jamesgorrie commented 9 years ago

Interestingly: Original: https://api.media.gutools.co.uk/images/4fb55b5c7d85075e91d3e6727024635b374bffbf vs Modified: https://api.media.gutools.co.uk/images/e61ff00dbca5e6cd4d855c7dbecb3b2fabc4123d

Note the originalMetadata is definitely "for the Guardi"

@paperboyo - are you sure you didn't update that metadata in PS?

paperboyo commented 9 years ago

I'm never sure of anything. But, yeah, pretty sure! Above there are more examples, from different Photoshop versions on different OSes... Can do more test, would just need to know what to test. One thing would be to dump original image and modified image metadata but NOT from within Photoshop (where the above dumps come from)...

paperboyo commented 9 years ago

Oh, fick mich! This actually looks like Adobe is being very strict in enforcing a 32 character (octets) limit for legacy IPTC-IIM fields that Capture One 8 Macintosh (that wrote the original file) does not care about.

The same info is housed in Creator XMP field in both the original file (Capture One save) and the Photoshop resave.

The only correct way is to prefer XMP whenever it’s available (some tools even contain preferences to deal with it, e.g. Photomechanic’s "Read embedded XMP before IPTC" option).

Sorry, ladies and gentlemen. We will have to dig XMP, anyway e.g. for https://github.com/guardian/grid/issues/823.

Some info:

https://forums.adobe.com/message/4219068#4219068

http://www.photometadata.org/META-Resources-metadata-types-standards-IPTC-IIM

http://www.controlledvocabulary.com/imagedatabases/iptc_naa.html

[mappings] http://www.controlledvocabulary.com/imagedatabases/iptc_core_mapped.pdf

I do have exiftool’s dumps of both file’s metadata, but do not know how to make such a nice diff as @theefer above :sadpanda:

jamesgorrie commented 9 years ago

I think we do store XMP - so we could run some tests to see if that gievs us what we desire - although baring in mind we are dealing predominantly with agencies when it comes to reading metadata, and IPTC is all the rage there.

theefer commented 9 years ago

but do not know how to make such a nice diff

# Open Terminal
$ diff -u file1 file2
theefer commented 9 years ago

It'd be worth checking indeed if when XMP is present, it is indeed the same value as the IPTC equivalent only longer.

jamesgorrie commented 9 years ago

Also not going to mess with any of our processors, even if it is "more correct".

paperboyo commented 9 years ago

baring in mind we are dealing predominantly with agencies when it comes to reading metadata, and IPTC is all the rage there

As far as I know, XMP is a (standardised) way to store (a newer) version of IPTC.

jamesgorrie commented 9 years ago

@paperboyo We'll run some tests to see what the output is if we do: XMP fallback IPTC.

paperboyo commented 9 years ago

even if it is "more correct"

Would there be any beyond-ASCII characters present, 32 character limit shrinks even more...

We'll run some tests

Supacool!

theefer commented 9 years ago

Not clear that that issue is the problem, seems more likely Photoshop cropped the metadata.

paperboyo commented 9 years ago

I disagree. Photoshop just followed standards for the legacy field/schema. It didn't crop the proper field that houses a full copy of the same information.

We just read the wrong field. We should read the correct one.

And, we will stumble upon similar dilemmas with our own metadata writing when/if we touch https://github.com/guardian/grid/issues/823. We do not have to deal with some live-saving hospital imaging system suddenly not working because it got a file that does not follow standards like Photoshop has to, so we may be as relaxed at it as Photomechanic (I wouldn't advise), but it's a decision none the less.

theefer commented 9 years ago

I can't see the IPTC By-line field in your metadata dump. Downloading the image I can see the by-line field in the metadata, but editing the image (with Gimp or resizing it with graphics magick) causes the IPTC metadata to be dropped. I wonder if it was invalid to begin with, which would be why Photoshop would be ignoring/dropping it?

I don't believe we're reading the wrong field; what field would you read instead? The XMP DublinCore Creator field?

paperboyo commented 9 years ago

Hi, my fault - my first dumps were from Photoshop’s (on an original file) that already made a translation of these two fields into one. At the danger of repeating myself, here is a full exiftoll dump of the original:

---- ExifTool ----
ExifTool Version Number         : 10.01
---- File ----
File Name                       : 4fb55b5c7d85075e91d3e6727024635b374bffbf.jpg
Directory                       : .
File Size                       : 2.9 MB
File Modification Date/Time     : 2015:09:07 22:12:37+01:00
File Access Date/Time           : 2015:09:07 22:12:37+01:00
File Creation Date/Time         : 2015:09:07 22:12:37+01:00
File Permissions                : rw-rw-rw-
File Type                       : JPEG
File Type Extension             : jpg
MIME Type                       : image/jpeg
Exif Byte Order                 : Little-endian (Intel, II)
Current IPTC Digest             : ff66aa1841316f24f02aa0c75219d284
Image Width                     : 5644
Image Height                    : 3763
Encoding Process                : Baseline DCT, Huffman coding
Bits Per Sample                 : 8
Color Components                : 3
Y Cb Cr Sub Sampling            : YCbCr4:4:4 (1 1)
---- JFIF ----
JFIF Version                    : 1.01
Resolution Unit                 : inches
X Resolution                    : 300
Y Resolution                    : 300
---- EXIF ----
Make                            : Canon
Camera Model Name               : Canon EOS 5D Mark III
X Resolution                    : 300
Y Resolution                    : 300
Resolution Unit                 : inches
Software                        : Capture One 8 Macintosh
Modify Date                     : 2000:01:01 00:00:31
Copyright                       : © Christopher Thomond
Modify Date                     : 2000:01:01 00:00:31
Exposure Time                   : 1/160
F Number                        : 3.2
Exposure Program                : Shutter speed priority AE
ISO                             : 1600
Exif Version                    : 0220
Date/Time Original              : 2000:01:01 00:00:31
Create Date                     : 2000:01:01 00:00:31
Shutter Speed Value             : 1/166
Aperture Value                  : 3.2
Exposure Compensation           : -2/3
Subject Distance                : 0 m
Metering Mode                   : Multi-segment
Flash                           : Off, Did not fire
Focal Length                    : 200.0 mm
User Comment                    : 
Exif Image Width                : 5644
Exif Image Height               : 3763
Focal Plane Resolution Unit     : inches
File Source                     : Digital Camera
Scene Type                      : Directly photographed
Exposure Mode                   : Auto
White Balance                   : Manual
Compression                     : JPEG (old-style)
Thumbnail Offset                : 1198
Thumbnail Length                : 1917
---- ICC_Profile ----
Profile CMM Type                : ADBE
Profile Version                 : 2.1.0
Profile Class                   : Display Device Profile
Color Space Data                : RGB
Profile Connection Space        : XYZ
Profile Date Time               : 2000:08:11 19:51:59
Profile File Signature          : acsp
Primary Platform                : Apple Computer Inc.
CMM Flags                       : Not Embedded, Independent
Device Manufacturer             : none
Device Model                    : 
Device Attributes               : Reflective, Glossy, Positive, Color
Rendering Intent                : Perceptual
Connection Space Illuminant     : 0.9642 1 0.82491
Profile Creator                 : ADBE
Profile ID                      : 0
Profile Copyright               : Copyright 2000 Adobe Systems Incorporated
Profile Description             : Adobe RGB (1998)
Media White Point               : 0.95045 1 1.08905
Media Black Point               : 0 0 0
Red Tone Reproduction Curve     : (Binary data 14 bytes, use -b option to extract)
Green Tone Reproduction Curve   : (Binary data 14 bytes, use -b option to extract)
Blue Tone Reproduction Curve    : (Binary data 14 bytes, use -b option to extract)
Red Matrix Column               : 0.60974 0.31111 0.01947
Green Matrix Column             : 0.20528 0.62567 0.06087
Blue Matrix Column              : 0.14919 0.06322 0.74457
---- IPTC ----
Coded Character Set             : UTF8
Envelope Record Version         : 4
Application Record Version      : 3
Date Created                    : 2015:08:18
Time Created                    : 00:00:31+01:00
Country-Primary Location Name   : UK
By-line                         : Christopher Thomond for The Guardian.
Credit                          : Christopher Thomond
Caption-Abstract                : Thomond.  MIDDLESBOROUGH 18th August 2015.   Labour party  leadership candidate Jeremy Corbyn addressing a rally at Middlesbrough Town Hall, Teesside...
Copyright Notice                : © Christopher Thomond
Prefs                           : Tagged:1, ColorClass:0, Rating:0, FrameNum:001350
---- XMP ----
XMP Toolkit                     : XMP Core 5.1.2
Legacy IPTC Digest              : C0A5BD50BB1D606F08C3CDFF26D88526
Country                         : UK
Credit                          : Christopher Thomond
Date Created                    : 2015:08:18 00:00:31+01:00
Exif Version                    : 0221
Lens Info                       : 70-200mm f/?
Lens                            : EF70-200mm f/2.8L IS II USM
Flash Compensation              : 0
Firmware                        : Firmware Version 1.2.3
Image Number                    : 1350
Marked                          : True
Color Class                     : 0 (None)
Tagged                          : Yes
Prefs                           : Tagged:1, ColorClass:0, Rating:0, FrameNum:001350
PM Version                      : PM5
Rating                          : 0
Creator Work Telephone          : 07836745201
Description                     : Thomond.  MIDDLESBOROUGH 18th August 2015.   Labour party  leadership candidate Jeremy Corbyn addressing a rally at Middlesbrough Town Hall, Teesside...
Creator                         : Christopher Thomond for The Guardian.
Rights                          : © Christopher Thomond
---- Composite ----
Aperture                        : 3.2
Date/Time Created               : 2015:08:18 00:00:31+01:00
Image Size                      : 5644x3763
Megapixels                      : 21.2
Shutter Speed                   : 1/160
Thumbnail Image                 : (Binary data 1917 bytes, use -b option to extract)
Focal Length                    : 200.0 mm
Light Value                     : 6.7

GM 1.3.21 (just the relevant bits, it doesn’t list them twice, and just hints at the second, xmp, info, by saying Profile-XMP: 5746 bytes):

Byline:
  Christopher Thomond for The Guardian.

IM 6.9.2-2 (again, just the bits, but here it lists their placement (?) in brackets):

Byline[2,80]: Christopher Thomond for The Guardian.

Most probably, there is a map of old IPTC-to-XMPIMPTC, maybe similar to what I have found above [possibly outdated?]: http://www.controlledvocabulary.com/imagedatabases/iptc_core_mapped.pdf

We should always try the new schema first and fall back to the old one only in the absence of it.

I think.

paperboyo commented 9 years ago

Hello,

Related: https://github.com/drewnoakes/metadata-extractor/issues/10; https://github.com/drewnoakes/metadata-extractor/issues/106#issuecomment-97402993

Many (all) image management apps seem to contain these maps of many>one (on read) and one>many (on write), see e.g. https://github.com/bhenne/FlickrCrawler/blob/master/Flickr05AnalyzeMetadata_new.py or http://daminion.net/files/daminion-metadata-mapping-rules.pdf or (cited earlier) http://www.controlledvocabulary.com/imagedatabases/iptc_core_mapped.pdf

IPTC itself maintains such a map: https://www.iptc.org/std/IIM/4.1/specification/IPTC-IIM-Schema4XMP-1.0-spec_1.pdf#page=20

A simple test of creating a new image in Photoshop, filling Basic:Author field (which automatically fills IPTC:Creator) and saving it as JPG reveals the input to that one field gets saved into multiple fields in different parts of the file’s metadata (the one that we read being the least safe as it gets truncated), exiftool output:

---- EXIF ----
Artist                          : Thirty two characters only Thęrty twó chąractęrś óńły
---- IPTC ----
By-line                         : Thirty two characters only Thęr
---- XMP ----
Creator                         : Thirty two characters only Thęrty twó chąractęrś óńły

PNG and GIF save of the same image reveals that the data is being saved only into XMP dc:creator field (so we wouldn’t even see it).

Unfortunately, this phenomenon is definitely not confined to Creator/Byline/Artist field in question here as above linked attempts at mappings indicate. The same information can reside in different parts of the metadata or just some of these parts depending on the file format and/or writing application. This data can be truncated if the writing application observed the (legacy) specs or it may not if it didn’t. The same data can reside in different parts of the metadata in different formats (e.g. date). Lastly, some of these places can contain conflicting metadata, which in some instances would be easier to fix (pick oldest for creation date), in some - much harder/impossible without appending all instances.

A simplest (?) fix would be to read metadata in a specified order from different fields/parts of the metadata (starting with dc:creator in this instance) and store the first found value for only the fields we are interested in. A proper solution would be to create a canonical map of all known metadata with rules about field priority, format conversions for data in certain fields, different strategies on how to deal with conflicts for different types of data. I’m sure such a thing already exists somewhere (if only in the above linked attempts), maybe in the open, maybe only in Adobe towers...

And reading is just half of the problem, then (when/if https://github.com/guardian/grid/issues/823), there is a problem of saving that data back.

Oh, and we could start by reopening the issue :wink:

Regards Mateusz

paperboyo commented 9 years ago

Just to fix my mistake of providing useless dumps from within Photoshop for both files, here they are from exiftool. (_ps.txt is from a Photoshop’s resave). Took me a rather long while to figure out how to provide the below on Windows...

--- 4fb55b5c7d85075e91d3e6727024635b374bffbf.txt        2015-09-08 20:53:49.074034800 +0100
+++ 4fb55b5c7d85075e91d3e6727024635b374bffbf_ps.txt     2015-09-08 20:54:20.278267500 +0100
@@ -1,37 +1,40 @@
 ---- ExifTool ----
 ExifTool Version Number         : 10.01
 ---- File ----
-File Name                       : 4fb55b5c7d85075e91d3e6727024635b374bffbf.jpg
+File Name                       : 4fb55b5c7d85075e91d3e6727024635b374bffbf_ps.jpg
 Directory                       : .
-File Size                       : 2.9 MB
-File Modification Date/Time     : 2015:09:07 22:12:37+01:00
-File Access Date/Time           : 2015:09:07 22:12:37+01:00
-File Creation Date/Time         : 2015:09:07 22:12:37+01:00
+File Size                       : 3.0 MB
+File Modification Date/Time     : 2015:09:07 22:13:11+01:00
+File Access Date/Time           : 2015:09:07 22:13:11+01:00
+File Creation Date/Time         : 2015:09:07 22:13:11+01:00
 File Permissions                : rw-rw-rw-
 File Type                       : JPEG
 File Type Extension             : jpg
 MIME Type                       : image/jpeg
 Exif Byte Order                 : Little-endian (Intel, II)
-Current IPTC Digest             : ff66aa1841316f24f02aa0c75219d284
+Current IPTC Digest             : e4bb4dd2636f85e307be62148c3cd2df
 Image Width                     : 5644
 Image Height                    : 3763
 Encoding Process                : Baseline DCT, Huffman coding
 Bits Per Sample                 : 8
 Color Components                : 3
 Y Cb Cr Sub Sampling            : YCbCr4:4:4 (1 1)
----- JFIF ----
-JFIF Version                    : 1.01
-Resolution Unit                 : inches
-X Resolution                    : 300
-Y Resolution                    : 300
 ---- EXIF ----
+Image Width                     : 5644
+Image Height                    : 3763
+Bits Per Sample                 : 8 8 8
+Photometric Interpretation      : RGB
+Image Description               : Thomond.  MIDDLESBOROUGH 18th August 2015.   Labour party  leadership candidate Jeremy Corbyn addressing a rally at Middlesbrough Town Hall, Teesside...
 Make                            : Canon
 Camera Model Name               : Canon EOS 5D Mark III
+Orientation                     : Horizontal (normal)
+Samples Per Pixel               : 3
 X Resolution                    : 300
 Y Resolution                    : 300
 Resolution Unit                 : inches
-Software                        : Capture One 8 Macintosh
-Modify Date                     : 2000:01:01 00:00:31
+Software                        : Adobe Photoshop CC 2014 (Windows)
+Modify Date                     : 2015:09:07 22:13:09
+Artist                          : Christopher Thomond for The Guardian.
 Copyright                       : © Christopher Thomond
 Modify Date                     : 2000:01:01 00:00:31
 Exposure Time                   : 1/160
@@ -49,6 +52,7 @@
 Flash                           : Off, Did not fire
 Focal Length                    : 200.0 mm
 User Comment                    :
+Color Space                     : Uncalibrated
 Exif Image Width                : 5644
 Exif Image Height               : 3763
 Focal Plane Resolution Unit     : inches
@@ -57,75 +61,112 @@
 Exposure Mode                   : Auto
 White Balance                   : Manual
 Compression                     : JPEG (old-style)
-Thumbnail Offset                : 1198
-Thumbnail Length                : 1917
----- ICC_Profile ----
-Profile CMM Type                : ADBE
-Profile Version                 : 2.1.0
-Profile Class                   : Display Device Profile
-Color Space Data                : RGB
-Profile Connection Space        : XYZ
-Profile Date Time               : 2000:08:11 19:51:59
-Profile File Signature          : acsp
-Primary Platform                : Apple Computer Inc.
-CMM Flags                       : Not Embedded, Independent
-Device Manufacturer             : none
-Device Model                    :
-Device Attributes               : Reflective, Glossy, Positive, Color
-Rendering Intent                : Perceptual
-Connection Space Illuminant     : 0.9642 1 0.82491
-Profile Creator                 : ADBE
-Profile ID                      : 0
-Profile Copyright               : Copyright 2000 Adobe Systems Incorporated
-Profile Description             : Adobe RGB (1998)
-Media White Point               : 0.95045 1 1.08905
-Media Black Point               : 0 0 0
-Red Tone Reproduction Curve     : (Binary data 14 bytes, use -b option to extract)
-Green Tone Reproduction Curve   : (Binary data 14 bytes, use -b option to extract)
-Blue Tone Reproduction Curve    : (Binary data 14 bytes, use -b option to extract)
-Red Matrix Column               : 0.60974 0.31111 0.01947
-Green Matrix Column             : 0.20528 0.62567 0.06087
-Blue Matrix Column              : 0.14919 0.06322 0.74457
+X Resolution                    : 72
+Y Resolution                    : 72
+Resolution Unit                 : inches
+Thumbnail Offset                : 1318
+Thumbnail Length                : 3938
 ---- IPTC ----
 Coded Character Set             : UTF8
 Envelope Record Version         : 4
+Coded Character Set             : UTF8
 Application Record Version      : 3
+Caption-Abstract                : Thomond.  MIDDLESBOROUGH 18th August 2015.   Labour party  leadership candidate Jeremy Corbyn addressing a rally at Middlesbrough Town Hall, Teesside...
+By-line                         : Christopher Thomond for The Guar
+Credit                          : Christopher Thomond
 Date Created                    : 2015:08:18
 Time Created                    : 00:00:31+01:00
 Country-Primary Location Name   : UK
-By-line                         : Christopher Thomond for The Guardian.
-Credit                          : Christopher Thomond
-Caption-Abstract                : Thomond.  MIDDLESBOROUGH 18th August 2015.   Labour party  leadership candidate Jeremy Corbyn addressing a rally at Middlesbrough Town Hall, Teesside...
 Copyright Notice                : © Christopher Thomond
 Prefs                           : Tagged:1, ColorClass:0, Rating:0, FrameNum:001350
+---- Photoshop ----
+IPTC Digest                     : e4bb4dd2636f85e307be62148c3cd2df
+X Resolution                    : 300
+Displayed Units X               : inches
+Y Resolution                    : 300
+Displayed Units Y               : inches
+Global Angle                    : 30
+Global Altitude                 : 30
+Copyright Flag                  : True
+Photoshop Thumbnail             : (Binary data 3938 bytes, use -b option to extract)
+Photoshop Quality               : 10
+Photoshop Format                : Standard
+Progressive Scans               : 3 Scans
 ---- XMP ----
-XMP Toolkit                     : XMP Core 5.1.2
-Legacy IPTC Digest              : C0A5BD50BB1D606F08C3CDFF26D88526
+XMP Toolkit                     : Adobe XMP Core 5.6-c014 79.156797, 2014/08/20-09:53:02
+Legacy IPTC Digest              : FF66AA1841316F24F02AA0C75219D284
 Country                         : UK
 Credit                          : Christopher Thomond
 Date Created                    : 2015:08:18 00:00:31+01:00
-Exif Version                    : 0221
+Color Mode                      : RGB
+ICC Profile Name                : Adobe RGB (1998)
 Lens Info                       : 70-200mm f/?
 Lens                            : EF70-200mm f/2.8L IS II USM
 Flash Compensation              : 0
 Firmware                        : Firmware Version 1.2.3
 Image Number                    : 1350
+Approximate Focus Distance      : 0
 Marked                          : True
 Color Class                     : 0 (None)
 Tagged                          : Yes
 Prefs                           : Tagged:1, ColorClass:0, Rating:0, FrameNum:001350
 PM Version                      : PM5
 Rating                          : 0
+Creator Tool                    : Capture One 8 Macintosh
+Modify Date                     : 2015:09:07 22:13:09+01:00
+Create Date                     : 2000:01:01 00:00:31
+Metadata Date                   : 2015:09:07 22:13:09+01:00
+Format                          : image/jpeg
+Document ID                     : adobe:docid:photoshop:3800c038-55a5-11e5-94e9-dadcbc31373f
+Instance ID                     : xmp.iid:d0f5c4d5-8566-0e47-9823-f0ac981e309d
+Original Document ID            : 11330C9CF637AAA4F2A4C6A79BB85654
 Creator Work Telephone          : 07836745201
-Description                     : Thomond.  MIDDLESBOROUGH 18th August 2015.   Labour party  leadership candidate Jeremy Corbyn addressing a rally at Middlesbrough Town Hall, Teesside...
 Creator                         : Christopher Thomond for The Guardian.
 Rights                          : © Christopher Thomond
+Description                     : Thomond.  MIDDLESBOROUGH 18th August 2015.   Labour party  leadership candidate Jeremy Corbyn addressing a rally at Middlesbrough Town Hall, Teesside...
+History Action                  : saved*saved
+History Instance ID             : xmp.iid:d786799d-17eb-474a-8f81-7c08bc1a1e66*xmp.iid:d0f5c4d5-8566-0e47-9823-f0ac981e309d
+History When                    : 2015:09:07 22:13:09+01:00*2015:09:07 22:13:09+01:00
+History Software Agent          : Adobe Photoshop CC 2014 (Windows)*Adobe Photoshop CC 2014 (Windows)
+History Changed                 : /*/
+---- ICC_Profile ----
+Profile CMM Type                : ADBE
+Profile Version                 : 2.1.0
+Profile Class                   : Display Device Profile
+Color Space Data                : RGB
+Profile Connection Space        : XYZ
+Profile Date Time               : 2000:08:11 19:51:59
+Profile File Signature          : acsp
+Primary Platform                : Apple Computer Inc.
+CMM Flags                       : Not Embedded, Independent
+Device Manufacturer             : none
+Device Model                    :
+Device Attributes               : Reflective, Glossy, Positive, Color
+Rendering Intent                : Perceptual
+Connection Space Illuminant     : 0.9642 1 0.82491
+Profile Creator                 : ADBE
+Profile ID                      : 0
+Profile Copyright               : Copyright 2000 Adobe Systems Incorporated
+Profile Description             : Adobe RGB (1998)
+Media White Point               : 0.95045 1 1.08905
+Media Black Point               : 0 0 0
+Red Tone Reproduction Curve     : (Binary data 14 bytes, use -b option to extract)
+Green Tone Reproduction Curve   : (Binary data 14 bytes, use -b option to extract)
+Blue Tone Reproduction Curve    : (Binary data 14 bytes, use -b option to extract)
+Red Matrix Column               : 0.60974 0.31111 0.01947
+Green Matrix Column             : 0.20528 0.62567 0.06087
+Blue Matrix Column              : 0.14919 0.06322 0.74457
+---- APP14 ----
+DCT Encode Version              : 100
+APP14 Flags 0                   : [14]
+APP14 Flags 1                   : (none)
+Color Transform                 : YCbCr
 ---- Composite ----
 Aperture                        : 3.2
 Date/Time Created               : 2015:08:18 00:00:31+01:00
 Image Size                      : 5644x3763
 Megapixels                      : 21.2
 Shutter Speed                   : 1/160
-Thumbnail Image                 : (Binary data 1917 bytes, use -b option to extract)
-Focal Length                    : 200.0 mm
+Thumbnail Image                 : (Binary data 3938 bytes, use -b option to extract)
 Light Value                     : 6.7
+Focal Length                    : 200.0 mm
jamesgorrie commented 9 years ago

It'd be good to run some tests on ingested images as I remember trying to do this and sometimes got conflicting data from different fields, especially credit when it came to Getty. It is also interesting to note that Drew has not included a lot of XMP fields for some perculiar reason.

paperboyo commented 9 years ago

sometimes got conflicting data from different fields

This seems like a very useful primer, defining what we should be doing where there are multiple fields containing the same information or even conflicts (we are doing exactly the opposite :stuck_out_tongue_winking_eye:).

Contains revealing pics, like that one:

metadata

The specific issue in question: 5.7 Creator.

Also MWG provides a downloadable Test Files which may be very useful when designing and testing the strategies for reading/writing metadata.

jamesgorrie commented 9 years ago

We would need to test it against the files that we get from our agencies to make sure that we categorise things correctly i.e. Getty et al doesn't put something completely different in the XMP data.

paperboyo commented 9 years ago

Yes. But a robust heuristic would likely pick up more things than just a simple lookup (to the legacy field). Metadata customs of our providers most likely have changed along the years and they are likely to change again.

paperboyo commented 9 years ago

Hi,

Just wanted to record a related case: 0823059192c1c8b17660d870c6fa21ee33407cf9

We show: Location: Gda?sk, Poland (reading IPTC: City, Country-PrimaryLocationName)

We could show: Location: Gdańsk, Pomorskie, Poland (by instead reading first XMP-iptcExt: LocationShownCity, LocationShownPrivinceState, LocationShownCountryCode)

Some other interesting bits in that image (like e.g. a correctly spelled Description or LocationShownSublocation: AmberExpo for easy identification of images from the same fair).

Regards Mateusz

paperboyo commented 7 years ago

These exiftool mappings can be useful: https://sourceforge.net/p/exiftool/code/ci/master/tree/arg_files/.

[EDIT: and these? https://sourceforge.net/p/exiftool/code/ci/master/tree/lib/Image/ExifTool/TagLookup.pm#l1715]

paperboyo commented 6 years ago

A comment about reconcilliation: xmp.dc:creator, xmp.dc:description and other fields which we should read before their iptc counterparts (which, in turn, should be followed by their exif counterparts) can be arrays/lists (e.g. xmp.dc:creator[1] or xmp.dc:description[2]). Most occurrences of those, thogh, are just duplicates, so we should deduplicate (possibly, in a case of dc:creator looking also at any source and credit fields we combine with it).

paperboyo commented 6 years ago

Reading XMP fixed in https://github.com/guardian/grid/pull/2180/commits/dcf79e07abf339dc7b29df4eb88809593638d3b6 (thanks!).

Reconcilliation (like we do with dates) will get its own issue. Above contains some (mildly) useful info.

paperboyo commented 4 years ago

The above often refers to Metadata Working Group’s Guidelines for Handling Image Metadata, so they are attached here for reference as they’ve been taken offline (but are still correct and useful).

almasadir commented 1 year ago

how to inject new meta data to jpg @

paperboyo commented 1 year ago

Hi @almasadir. The repository you posted your question hosts the application that has no metadata writing capabilities. If you are looking at consumer applications, Adobe Bridge is free and many other photo organisers allow writing metadata, some can even be written by Finder or Windows Explorer. If you are looking at doing it programatically, google exiftool.