phiresky / ripgrep-all

rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.
Other
6.4k stars 148 forks source link

Feature: search within JPG metadata #221

Open Pfeil opened 2 months ago

Pfeil commented 2 months ago

Is your feature request related to a problem? Please describe.

I use the Aves Android App to manage and sometimes tag my files. But in order to find images by tag with rga, I need to use the --text parameter to find it, as it does not seem to take a look into the metadata. This works, as the XML seems to be in there in plain text, but produces ugly output (you know, binary stuff producing weird characters in the terminal).

Describe the solution you'd like

Search within metadata (timestamps and others) should be done by default, as jpg is a common format type.

Describe alternatives you've considered

Additional context

This is not specific to Aves only. For example, digiKam also stores XMP within image files. This includes information about recognized faces in images and possibly nondestructive-editing information.

I think there is a lot of potential for looking into metadata. Here is a screenshot of Aves, which shows the different metadata layers of an image taken with ma phone and added a single tag (so, very minimal). I can only view one layer at once, but you get the idea. The opened layer shows how Aves uses XMP / dublin core to store tags within a file.

Aves Screenshot showing image metadata as layers

phiresky commented 2 months ago

I did actually have plans to add an EXIF adapter originally, but I guess I never did? Probably because usually there's not really a lot of interesting searchable data in there.

In any case, this should be easy to do with a Custom Adapter calling exiftool.

exiftool has lots of output options so you might have to look at the docs. Try starting with exiftool -g -u - as the command and jpg,jpeg,png as the extensions.

If you do find a good config json, do post it in the Wiki: https://github.com/phiresky/ripgrep-all/discussions/categories/show-your-adapter

I'll probably add the top voted adapters from the wiki to core at some point.

Pfeil commented 2 months ago

Thank you for the hint. I looked a bit into exiftool and was impressed how many file formats it supports according to the manpage. So for now, this prototype works for me:

{
      "name": "exiftool",
      "version": 1,
      "description": "Uses exiftool to extract all plain text metadata from supported files.",

      "extensions": ["jpg", "jpeg"],
      "mimetypes": ["image/jpeg"],

      "binary": "exiftool",
      "args": ["-g", "-u", "-"],
      "disabled_by_default": false,
      "match_only_by_mime": false
}

Example Output:

$ rga "Notiz" .
./IMG_20240113_183022.jpg
Subject                         : Notiz

As for the wiki version, to which kind of files would you apply exiftool for (e.g. to avoid double extractions)? This list in the manpage is pretty long. I wonder what it gets out of pptx files, for example. I do not have any at hand right now. Probably the office username and similar.

 File Types
 ------------+-------------+-------------+-------------+------------
 360   r/w   | DOCX  r     | ITC   r     | O     r     | RSRC  r
 3FR   r     | DPX   r     | J2C   r     | ODP   r     | RTF   r
 3G2   r/w   | DR4   r/w/c | JNG   r/w   | ODS   r     | RW2   r/w
 3GP   r/w   | DSS   r     | JP2   r/w   | ODT   r     | RWL   r/w
 7Z    r     | DV    r     | JPEG  r/w   | OFR   r     | RWZ   r
 A     r     | DVB   r/w   | JSON  r     | OGG   r     | RM    r
 AA    r     | DVR-MS r    | JXL   r     | OGV   r     | SEQ   r
 AAC   r     | DYLIB r     | K25   r     | ONP   r     | SKETCH r
 AAE   r     | EIP   r     | KDC   r     | OPUS  r     | SO    r
 AAX   r/w   | EPS   r/w   | KEY   r     | ORF   r/w   | SR2   r/w
 ACR   r     | EPUB  r     | LA    r     | ORI   r/w   | SRF   r
 AFM   r     | ERF   r/w   | LFP   r     | OTF   r     | SRW   r/w
 AI    r/w   | EXE   r     | LIF   r     | PAC   r     | SVG   r
 AIFF  r     | EXIF  r/w/c | LNK   r     | PAGES r     | SWF   r
 APE   r     | EXR   r     | LRV   r/w   | PBM   r/w   | THM   r/w
 ARQ   r/w   | EXV   r/w/c | M2TS  r     | PCD   r     | TIFF  r/w
 ARW   r/w   | F4A/V r/w   | M4A/V r/w   | PCX   r     | TORRENT r
 ASF   r     | FFF   r/w   | MACOS r     | PDB   r     | TTC   r
 AVI   r     | FITS  r     | MAX   r     | PDF   r/w   | TTF   r
 AVIF  r/w   | FLA   r     | MEF   r/w   | PEF   r/w   | TXT   r
 AZW   r     | FLAC  r     | MIE   r/w/c | PFA   r     | VCF   r
 BMP   r     | FLIF  r/w   | MIFF  r     | PFB   r     | VNT   r
 BPG   r     | FLV   r     | MKA   r     | PFM   r     | VRD   r/w/c
 BTF   r     | FPF   r     | MKS   r     | PGF   r     | VSD   r
 C2PA  r     | FPX   r     | MKV   r     | PGM   r/w   | WAV   r
 CHM   r     | GIF   r/w   | MNG   r/w   | PLIST r     | WDP   r/w
 COS   r     | GLV   r/w   | MOBI  r     | PICT  r     | WEBP  r/w
 CR2   r/w   | GPR   r/w   | MODD  r     | PMP   r     | WEBM  r
 CR3   r/w   | GZ    r     | MOI   r     | PNG   r/w   | WMA   r
 CRM   r/w   | HDP   r/w   | MOS   r/w   | PPM   r/w   | WMV   r
 CRW   r/w   | HDR   r     | MOV   r/w   | PPT   r     | WPG   r
 CS1   r/w   | HEIC  r/w   | MP3   r     | PPTX  r     | WTV   r
 CSV   r     | HEIF  r/w   | MP4   r/w   | PS    r/w   | WV    r
 CUR   r     | HTML  r     | MPC   r     | PSB   r/w   | X3F   r/w
 CZI   r     | ICC   r/w/c | MPG   r     | PSD   r/w   | XCF   r
 DCM   r     | ICO   r     | MPO   r/w   | PSP   r     | XISF  r
 DCP   r/w   | ICS   r     | MQV   r/w   | QTIF  r/w   | XLS   r
 DCR   r     | IDML  r     | MRC   r     | R3D   r     | XLSX  r
 DFONT r     | IIQ   r/w   | MRW   r/w   | RA    r     | XMP   r/w/c
 DIVX  r     | IND   r/w   | MXF   r     | RAF   r/w   | ZIP   r
 DJVU  r     | INSP  r/w   | NEF   r/w   | RAM   r     |
 DLL   r     | INSV  r     | NKSC  r/w   | RAR   r     |
 DNG   r/w   | INX   r     | NRW   r/w   | RAW   r/w   |
 DOC   r     | ISO   r     | NUMBERS r   | RIFF  r     |  

And it supports among others:

  Meta Information
 ----------------------+----------------------+---------------------
 EXIF           r/w/c  |  CIFF           r/w  |  Ricoh RMETA    r
 GPS            r/w/c  |  AFCP           r/w  |  Picture Info   r
 IPTC           r/w/c  |  Kodak Meta     r/w  |  Adobe APP14    r
 XMP            r/w/c  |  FotoStation    r/w  |  MPF            r
 MakerNotes     r/w/c  |  PhotoMechanic  r/w  |  Stim           r
 Photoshop IRB  r/w/c  |  JPEG 2000      r    |  DPX            r
 ICC Profile    r/w/c  |  DICOM          r    |  APE            r
 MIE            r/w/c  |  Flash          r    |  Vorbis         r
 JFIF           r/w/c  |  FlashPix       r    |  SPIFF          r
 Ducky APP12    r/w/c  |  QuickTime      r    |  DjVu           r
 PDF            r/w/c  |  Matroska       r    |  M2TS           r
 PNG            r/w/c  |  MXF            r    |  PE/COFF        r
 Canon VRD      r/w/c  |  PrintIM        r    |  AVCHD          r
 Nikon Capture  r/w/c  |  FLAC           r    |  ZIP            r
 GeoTIFF        r/w/c  |  ID3            r    |  (and more)

PS: For me, the issue is solved with this. I'll tinker around with exiftools options and post into the wiki later on. So for me it is fine if you like to close the issue.