phiresky / ripgrep-all

rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.
Other
6.4k stars 148 forks source link

pdftotext grepping problem #232

Open 0xbadb0d00 opened 1 week ago

0xbadb0d00 commented 1 week ago

Describe the bug When I use the pdftotext plugin to convert PDFs, the result obtained is completely distorted compared to the original PDF. However, I noticed that by passing the "-layout" parameter to it, the text is extracted correctly.

To Reproduce Use a pdf that contains a table.

Output of rga --version ripgrep-all 0.10.6

Can I try to modify it (adding -layout paramete) directly as a temporary solution? If yes, how can I do it?

lafrenierejm commented 1 week ago

@0xbadb0d00 Any chance you can provide an example PDF for testing?