photoprism / photoprism

AI-Powered Photos App for the Decentralized Web 🌈💎✨
https://www.photoprism.app
Other
34.61k stars 1.9k forks source link

Metadata: Improve XMP parser to support more tags #2260

Open lastzero opened 2 years ago

lastzero commented 2 years ago

As a user with a lot of metadata in XMP sidecar files, I want PhotoPrism to index more of that information so I can easily view and search it.

yq is a portable YAML, JSON and XML command line processor that should make reading XMP much easier than the current implementation: https://github.com/mikefarah/yq

Developer Guide > XMP:

Related Issues:

jmalm commented 2 years ago

Have you considered using ExifTool in some way for reading / writing image metadata (both from the images and to/from XMP sidecar files)? For example using https://github.com/barasher/go-exiftool? (go-exiftool has a GPL 3 license, which might not be compatible with the licensing strategy of Photoprism. ExifTool, on the other hand, looks like it has a very permissive license.)

My impression is that ExifTool has become something of a standard reference tool when it comes to image metadata, so it may help even more than yq.

For reference, Librephotos does it this way. (I implemented writing to XMP sidecar files and made some smaller changes to the reading as well in that project, but the choice of ExifTool was made earlier.)

lastzero commented 2 years ago

We already use Exiftool, but it's an external Perl script and XML isn't hard to parse as such. It's just that the built-in XML support in Go is pretty bad, at least last time I checked. See my notes in the Developer Guide.

jmalm commented 2 years ago

(The following is some testing and reasoning...)

I guess one of the nice things with using ExifTool is that you don't have to care as much about the structure of the tags? I.e. you can read rating by exiftool -Rating FILENAME and write it by exiftool -Rating=4 FILENAME in an XMP sidecar like the following:

<x:xmpmeta xmlns:x='adobe:ns:meta/' x:xmptk='Image::ExifTool 10.10'>
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>

 <rdf:Description rdf:about=''
  xmlns:dc='http://purl.org/dc/elements/1.1/'>
  <dc:creator>
   <rdf:Seq>
    <rdf:li>Jakob Malm</rdf:li>
   </rdf:Seq>
  </dc:creator>
  <dc:rights>
   <rdf:Alt>
    <rdf:li xml:lang='x-default'>Jakob Malm</rdf:li>
   </rdf:Alt>
  </dc:rights>
 </rdf:Description>

 <rdf:Description rdf:about=''
  xmlns:xmp='http://ns.adobe.com/xap/1.0/'>
  <xmp:Rating>4</xmp:Rating>
 </rdf:Description>
</rdf:RDF>
</x:xmpmeta>

With yq command line, reading would be something like yq -p=xml ".xmpmeta.RDF.Description[] | select(.Rating) | .Rating" FILENAME. Writing could be accomplished by yq -p=xml -o=xml "(.xmpmeta.RDF.Description[] | select(.Rating) | .Rating) = 4" FILENAME, but it seems the entire file is rewritten (I guess that is probably the case with exiftool too), without namespaces:

<xmpmeta x="adobe:ns:meta/" xmptk="Image::ExifTool 12.16">
  <RDF rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <Description about="" dc="http://purl.org/dc/elements/1.1/">
      <creator>
        <Seq>
          <li>Jakob Malm</li>
        </Seq>
      </creator>
      <rights>
        <Alt>
          <li lang="x-default">Jakob Malm</li>
        </Alt>
      </rights>
    </Description>
    <Description about="" xmp="http://ns.adobe.com/xap/1.0/">
      <Rating>4</Rating>
    </Description>
  </RDF>
</xmpmeta>

I haven't found a way to not have to specify (and know!) the "absolute" path to the tag. This may be ok and perhaps even desired, but I think the missing namespaces might be a problem.

On the other hand, two nice things with using yq would be

lastzero commented 1 year ago

An alternative library to take a look at is https://github.com/beevik/etree

XMP-related issues that may depend on this: