phiresky / ripgrep-all

rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.
Other
8.13k stars 176 forks source link

Search audio file lyrics #159

Closed gergesh closed 1 year ago

gergesh commented 1 year ago

I think it could be great to have an adapter for audio files which checks their metadata for embedded lyrics.

I can take a shot at implementing it (my rust skills are basically nonexistent, so brace yourself). One option for implementing it would be using ffmpeg, similarly to how videos are handled right now, another could be using a rust library such as audiotags or taglib-rust.

What are your opinions on this matter?

phiresky commented 1 year ago

If you want to build an integrated rust tool that I'll accept a PR, but with 1.0 (master branch, unreleased), you can create an external adapter (that gets a file and returns text) in a different (scripting) language, see https://github.com/phiresky/ripgrep-all/issues/146. If you do that, post it here or add it to the https://github.com/phiresky/ripgrep-all/wiki/Community-Adapters wiki page.

gergesh commented 1 year ago

Huh, I actually just started browsing through the code, found the custom adapter feature and started researching its state / why it isn't enabled (or documented) yet.

This is my setup as of now, not adding it to the wiki because it's very tailored to my needs and environment.

{
  // This file follows the JSON schema defined below.
  // If you use an editor that supports JSON schema (e.g. VS Code),
  // you should be getting IntelliSense and validation.
  "$schema": "./config.v1.schema.json",
  // The default config and schema will be regenerated if they are missing
  // https://github.com/phiresky/ripgrep-all/blob/master/doc/config.default.jsonc

  // The config options are the same as the command line options,
  // but with --rga- prefix removed and - replaced with _.
  // e.g. --rga-no-cache becomes `"no_cache": true.
  // The only exception is the `custom_adapters` option, which can only be set in this file.

  "custom_adapters": [
    // See https://github.com/phiresky/ripgrep-all/wiki for more information
    // to verify if your custom adapters are picked up correctly, run `rga --rga-list-adapters`
    {
            "name": "lyrics",
            "version": 1,
            "description": "extract song lyrics",

            "extensions": ["flac", "mp3", "ogg"],
            "mimetypes": [],

            "binary": "lyrics",
            "args": ["-"],
            "disabled_by_default": false,
            "match_only_by_mime": false,
            "postprocessors": []
    }
  ]
}

Where lyrics is this script (which I already had on my $PATH, impacting my decision to use it):

#!/usr/bin/env sh

cur="$(cmus-remote -Q | sed -n 's/^file //p')"
aud="${1:-$cur}"
#lrc="${aud%.*}.lrc"

exiftool -j "$aud" | jq -r ".[0].Lyrics"

As you can see I left a few things empty, but it works so I'm satisfied. Thanks!

phiresky commented 1 year ago

Thanks for the info! I considered adding it to the wiki but I realized that ffprobe can do this as well so I added lyric extraction to core with 5281839