phiresky / ripgrep-all

rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.
Other
6.4k stars 148 forks source link

htm extension causes errors in pandoc adapter #230

Open vvvvvx opened 1 month ago

vvvvvx commented 1 month ago

Describe the bug

when I run "rga -o --rga-adapters=pandoc search-key-words", a lots of failed msg occur like below:

rg: Documents/Documents/Personal/飞信/Misc/Space/tab/forward.htm: preprocessor command failed: '"/usr/bin/rga-preproc" "Documents/Documents/Personal/\xe9\xa3\x9e\xe4\xbf\xa1/Misc/Space/tab/forward.htm"':

/home/user/文档/Documents/Documents/Personal/飞信/Misc/Space/tab/forward.htm adapter: pandoc /home/user/文档/Documents/Documents/Personal/飞信/Misc/Space/tab/forward.htm.txt adapter: postprocprefix Unknown input format htm Error: copying adapter output to stdout

Caused by: 0: subprocess: Command { std: "pandoc" "--from=htm" "--to=plain" "--wrap=none" "--markdown-headings=atx", kill_on_drop: false } 1: ExitStatus(unix_wait_status(5376))

To Reproduce

Attach example file:

Run command: cmd: rga -o --rga-adapters=pandoc search-key-words

Output

rg: Documents/Documents/Personal/飞信/Misc/Space/tab/forward.htm: preprocessor command failed: '"/usr/bin/rga-preproc" "Documents/Documents/Personal/\xe9\xa3\x9e\xe4\xbf\xa1/Misc/Space/tab/forward.htm"':

/home/user/文档/Documents/Documents/Personal/飞信/Misc/Space/tab/forward.htm adapter: pandoc /home/user/文档/Documents/Documents/Personal/飞信/Misc/Space/tab/forward.htm.txt adapter: postprocprefix Unknown input format htm Error: copying adapter output to stdout

Caused by: 0: subprocess: Command { std: "pandoc" "--from=htm" "--to=plain" "--wrap=none" "--markdown-headings=atx", kill_on_drop: false } 1: ExitStatus(unix_wait_status(5376)) Screenshots image

If applicable, add screenshots to help explain your problem.

Operating System and Version Artix Linux

Output of rga --version ripgrep-all 0.10.6