phiresky / ripgrep-all

rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.
Other
6.4k stars 148 forks source link

When installed via `choco`: Fails to adapt and search PDFs on Windows: `Error: copying adapter output to stdout`; poppler `pdftotext` exit status `99` #220

Open ElectricRCAircraftGuy opened 2 months ago

ElectricRCAircraftGuy commented 2 months ago

Describe the bug

It fails to search any PDF.

Error it gives me:

Error: copying adapter output to stdout

Caused by:
    0: adapting C:\Users\gstaples\temp\dummy.pdf.txt.asciipagebreaks via postprocpagebreaks failed
    1: subprocess: Command { std: "pdftotext" "-" "-", kill_on_drop: false }
    2: ExitStatus(ExitStatus(99))

Example run and output in the Git Bash terminal:

gstaples@my-pc MINGW64 ~/temp
$ rga my dummy.pdf
dummy.pdf: preprocessor command failed: '"C:\\ProgramData\\chocolatey\\lib\\ripgrep-all\\tools\\ripgrep_all-v0.10.6-x86_64-pc-windows-msvc\\rga-preproc" "dummy.pdf"':
-------------------------------------------------------------------------------
C:\Users\gstaples\temp\dummy.pdf adapter: poppler
C:\Users\gstaples\temp\dummy.pdf.txt.asciipagebreaks adapter: postprocpagebreaks
pdftotext version 4.00
Copyright 1996-2017 Glyph & Cog, LLC
Usage: pdftotext [options] <PDF-file> [<text-file>]
  -f <int>             : first page to convert
  -l <int>             : last page to convert
  -layout              : maintain original physical layout
  -simple              : simple one-column page layout
  -table               : similar to -layout, but optimized for tables
  -lineprinter         : use strict fixed-pitch/height layout
  -raw                 : keep strings in content stream order
  -fixed <number>      : assume fixed-pitch (or tabular) text
  -linespacing <number>: fixed line spacing for LinePrinter mode
  -clip                : separate clipped text
  -nodiag              : discard diagonal text
  -enc <string>        : output text encoding name
  -eol <string>        : output end-of-line convention (unix, dos, or mac)
  -nopgbrk             : don't insert page breaks between pages
  -bom                 : insert a Unicode BOM at the start of the text file
  -opw <string>        : owner password (for encrypted files)
  -upw <string>        : user password (for encrypted files)
  -q                   : don't print any messages or errors
  -cfg <string>        : configuration file to use in place of .xpdfrc
  -v                   : print copyright and version info
  -h                   : print usage information
  -help                : print usage information
  --help               : print usage information
  -?                   : print usage information
Error: copying adapter output to stdout

Caused by:
    0: adapting C:\Users\gstaples\temp\dummy.pdf.txt.asciipagebreaks via postprocpagebreaks failed
    1: subprocess: Command { std: "pdftotext" "-" "-", kill_on_drop: false }
    2: ExitStatus(ExitStatus(99))
-------------------------------------------------------------------------------

Example run and output in Command Prompt:

C:\Users\gstaples\temp>rga my dummy.pdf
dummy.pdf: preprocessor command failed: '"C:\\ProgramData\\chocolatey\\lib\\ripgrep-all\\tools\\ripgrep_all-v0.10.6-x86_64-pc-windows-msvc\\rga-preproc" "dummy.pdf"':
-------------------------------------------------------------------------------
C:\Users\gstaples\temp\dummy.pdf adapter: poppler
C:\Users\gstaples\temp\dummy.pdf.txt.asciipagebreaks adapter: postprocpagebreaks
Error: copying adapter output to stdout

Caused by:
    0: adapting C:\Users\gstaples\temp\dummy.pdf.txt.asciipagebreaks via postprocpagebreaks failed
    1: subprocess: Command { std: "pdftotext" "-" "-", kill_on_drop: false }
    2: ExitStatus(ExitStatus(3221225781))
-------------------------------------------------------------------------------

To Reproduce

Note: if you don't have choco (Chocolatey) installed, you can first install it by running this as an admin in PowerShell:

Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))

Source: https://chocolatey.org/install

Next:

  1. Open a Git Bash terminal as admin, and install via choco:
    choco install ripgrep-all
    # choose the `A` option for "yes to all"
  2. Download the attached dummy.pdf file.
  3. Run rga my dummy.pdf to try to search dummy.pdf for the word "my".

If on Windows 10 Pro, you'll also need to install vc_redist.x64.exe, as the instructions say here: https://github.com/phiresky/ripgrep-all?tab=readme-ov-file#windows

If on Windows 11 Pro, you won't.

If on Windows 10 Pro running in the Command Prompt, you may see this error pop up:

image

pdftotext.exe - System Error

The code execution cannot proceed because freetype.dll was not found. Reinstalling the program may fix this problem.

OK

If on Windows 10 Pro running in the Git Bash terminal, you will not see this error.

Attach example file:

dummy.pdf

Run command:

rga my dummy.pdf

Output

See above.

Screenshots

If applicable, add screenshots to help explain your problem.

See above.

Operating System and Version

Tested in Windows 10 Pro and in Windows 11 Pro. Same result in both.

Output of rga --version

ripgrep-all 0.10.6
ElectricRCAircraftGuy commented 2 months ago

Reinstalling poppler in an Admin terminal with choco install --force poppler did not fix the problem.

ElectricRCAircraftGuy commented 2 months ago

Temporary work-around: install rga with scoop instead of with choco!

Update: installing with scoop seems to fix it! I wonder if scoop installs a later version of poppler? Here is the version of poppler that scoop installed:

Installing 'poppler' (24.02.0-0) [64bit] from 'main' bucket
Release-24.02.0-0.zip (14.2 MB) [===============================================================] 100%
Checking hash of Release-24.02.0-0.zip ... ok.
Extracting Release-24.02.0-0.zip ... done.
Linking ~\scoop\apps\poppler\current => ~\scoop\apps\poppler\24.02.0-0
Creating shim for 'pdfattach'.
Creating shim for 'pdfdetach'.
Creating shim for 'pdffonts'.
Creating shim for 'pdfimages'.
Creating shim for 'pdfinfo'.
Creating shim for 'pdfseparate'.
Creating shim for 'pdftocairo'.
Creating shim for 'pdftohtml'.
Creating shim for 'pdftoppm'.
Creating shim for 'pdftops'.
Creating shim for 'pdftotext'.
Creating shim for 'pdfunite'.
'poppler' (24.02.0-0) was installed successfully!

Instructions to install rga via scoop

  1. Install scoop: in a non-admin PowerShell, run:

    Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
    Invoke-RestMethod -Uri https://get.scoop.sh | Invoke-Expression

    Source: https://scoop.sh/

  2. Install rga via scoop: in a non-admin PowerShell, run:

    # Install rga (ripgrep-all)
    scoop install rga
    
    # Install fzf too while we are at it
    scoop install fzf

    Source: https://github.com/phiresky/ripgrep-all?tab=readme-ov-file#scoop

Now it works!

Example run and output, as expected:

Git Bash terminal:

gstaples@my-pc MINGW64 ~/temp
$ rga my dummy.pdf
Page 1: Dummy PDF file

Screenshot so you can see where my was found:
image

PowerShell terminal:

PS C:\Users\gstaples\temp> rga my .\dummy.pdf
Page 1: Dummy PDF file

Command Prompt:

C:\Users\gstaples\temp>rga my dummy.pdf
Page 1: Dummy PDF file

I also quoted the above in my answer here: Stack Overflow: How to install ripgrep on Windows?

phiresky commented 2 months ago

I guess this is a problem with chocolately then, you had a version of pdftotext not from the poppler project.