phiresky / ripgrep-all

rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.
Other
6.4k stars 148 forks source link

pandoc: unknown option --markdown-headings #180

Closed werdna-b closed 10 months ago

werdna-b commented 10 months ago

Describe the bug does not search inside .docx files and returns:

adapter: pandoc Unknown option --markdown-headings.

To Reproduce Run command: rga 'text to search for'

rga version: 1.0.0-alpha.5

lafrenierejm commented 10 months ago

@werdna-b Can you provide your version of pandoc?

boutros commented 10 months ago

I have same problem, using latest rga version

> pandoc --version
pandoc 2.9.2.1
Compiled with pandoc-types 1.20, texmath 0.12.0.2, skylighting 0.8.5

Edit: This is the pandoc version in ubuntu LTS. It seems to work fine when I install latest pandoc (3.1.7) manually

lafrenierejm commented 10 months ago

@phiresky I'm wondering if the pandoc flags ought to be customizable by users via the config file. I think it's reasonable to stick with the current flags as the default, but clearly there's a need to be able to support older versions of pandoc.

phiresky commented 10 months ago

It is possible to customize this by recreating the internal adapter. Example ~/.config/ripgrep-all/config.jsonc:

{
  // This file follows the JSON schema defined below.
  // If you use an editor that supports JSON schema (e.g. VS Code),
  // you should be getting IntelliSense and validation.
  "$schema": "./config.v1.schema.json",
  // The default config and schema will be regenerated if they are missing
  // https://github.com/phiresky/ripgrep-all/blob/master/doc/config.default.jsonc

  // The config options are the same as the command line options,
  // but with --rga- prefix removed and - and . replaced with _.
  // e.g. --rga-no-cache becomes `"no_cache": true.
  // The only exception is the `custom_adapters` option, which can only be set in this file.

  "adapters": ["-pandoc"],
  // See https://github.com/phiresky/ripgrep-all/wiki for more information
  // to verify if your custom adapters are picked up correctly, run `rga --rga-list-adapters`
  "custom_adapters": [
    {
      "name": "pandoc-legacy",
      "description": "Uses legacy pandoc (<v3) to convert binary/unreadable text documents to plain markdown-like text",
      "version": 3,
      "extensions": ["epub", "odt", "docx", "fb2", "ipynb"],
      "binary": "pandoc",
      "mimetypes": null,
      "args": [
        "--from=$input_file_extension",
        "--to=plain",
        "--wrap=none",
        "--atx-headings"
      ]
    }
  ]
}

I don't think I'm willing to support outdated versions of the dependent software directly since it seems like a can of work worms.

firxworx commented 5 months ago

For anyone that needs a quick and easy solution for Ubuntu or any other linux distro, brew's pandoc works as a dependency vs. its ripgrep-all package out of the box.

Yes, the homebrew/brew command most associated with MacOS.

I use https://docs.brew.sh/Homebrew-on-Linux on my Ubuntu developer workstations to install certain dev/productivity tooling.

brew often has more recent versions of software, it has many dev-focused packages that apt doesn't have, and it is far more convenient than building a collection of tools manually from source. Its formula (aka "packages") are just scripts you can easily check.

While there are potential drawbacks to mixing and matching package managers, having a convention helps keep things organized and predictable. One benefit is there can be better parity between workstations on a team running linux/WSL/MacOS for internal tools & scripts.

I first added brew to Ubuntu out of necessity because AWS only ships certain cli tools this way however the above approach grew on me over time.

denispollini commented 4 months ago

Hi i install homebrew but i have the same problem as install from apt pandoc give me an error when i try to use rga command. image