phiresky / ripgrep-all

rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.
Other
8.18k stars 178 forks source link

Poppler pdftotext problem #243

Closed bruno299792 closed 1 month ago

bruno299792 commented 2 months ago

Hi,

need your help

i know rga use poppler's pdftotext to read pdf, but the whole poppler is too large, and rga only use small part of it is there any possible to install pdftotext only, or can I use xpdfreader's pdftotext binary instead?

lafrenierejm commented 1 month ago

is there any possible to install pdftotext only

I haven't personally verified this, but since pdftotext has its own entry in poppler's CMake file I suspect you could build and use just that one utility.

That said, poppler and poppler-utils are often available as two separate packages with pdftotext included in the latter. If that is the case for your OS, you should be able to get away with only installing the -utils package and not the full poppler package.

can I use xpdfreader's pdftotext binary instead?

You should be able to, as long as the CLI of xpdfreader's pdftotext binary is compatible with the one offered by poppler. ripgrep-all just looks for the first entry in your PATH named "poppler"; if xpdfreader's pdftotext is found before poppler's (or if you don't have the binary from poppler on your PATH at all) then rga will use it.

phiresky commented 1 month ago

the xpdfreader binary is not compatible, it does not support stdin input.

you can use whatever means you want to get the pdftotext binary, but by default it's dynamically linked to libpoppler.so on my system. I'm sure it's possible to compile it statically if you want, but it's probably not going to save much size.

Closing as off topic, this question isn't really related to rga