phiresky / ripgrep-all

rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.
Other
6.4k stars 148 forks source link

Pandoc : couldn't unpack docx container: Did not find end of central directory signature #191

Closed vvvvvx closed 7 months ago

vvvvvx commented 7 months ago

Describe the bug

文档/Documents/文件/2022年文件/公文/附件:xxxx纪要.docx: preprocessor command failed: '"/usr/bin/rga-preproc" "xxxx\xa6\x81.docx"':

-------------------------------------------------------------------------------
adapter: pandoc
couldn't unpack docx container: Did not find end of central directory signature
Error: subprocess failed: ExitStatus(unix_wait_status(16128))
-------------------------------------------------------------------------------

To Reproduce

Attach example file:

Run command: rga xxx --rga-adapters=pandoc,poppler

Output

文档/Documents/文件/2022年文件/公文/附件:xxxx纪要.docx: preprocessor command failed: '"/usr/bin/rga-preproc" "xxxx\xa6\x81.docx"': 
-------------------------------------------------------------------------------
adapter: pandoc
couldn't unpack docx container: Did not find end of central directory signature
Error: subprocess failed: ExitStatus(unix_wait_status(16128))
-------------------------------------------------------------------------------

Screenshots If applicable, add screenshots to help explain your problem. pandoc

Operating System and Version Linux X1-Artix 6.5.7-artix1-1 #1 SMP PREEMPT_DYNAMIC Sun, 15 Oct 2023 22:13:26 +0000 x86_64 GNU/Linux

Output of rga --version ripgrep-all 0.9.6

lafrenierejm commented 7 months ago

@vvvvvx Have you tried the latest pre-release version, v1.0.0-alpha.5? If so, does fix the issue? If not, could you attach the original file or (preferably) a minimum working example file?

lafrenierejm commented 7 months ago

@vvvvvx Please also provide the version of Pandoc that you are using.

lafrenierejm commented 7 months ago

https://github.com/jgm/pandoc/issues/2891#issuecomment-238095378 might be relevant here. It looks like this might be a known bug with old versions of Pandoc.

phiresky commented 7 months ago

I think I'm going to start more liberally closing issues that don't follow the issue template (in this case missing example file) since I can't really do anything with an issue like this even if I wanted to.

In any case, like @lafrenierejm says this seems very likely to be or have been a pandoc upstream issue (potentially already fixed) since that error message is generated by them.