quickwit-oss / tantivy-cli

MIT License
326 stars 59 forks source link

Search command println panics with broken pipes #54

Closed erichutchins closed 3 years ago

erichutchins commented 3 years ago

I commonly use head to preview search results but this causes a panic. I resorted to some silly bash tricks to filter out this error.

$ tantivy search -i wikipedia-index -q "barack obama" | head >/dev/null
thread 'main' panicked at 'failed printing to stdout: Broken pipe (os error 32)', src/libstd/io/stdio.rs:955:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Would you be amenable to swapping println for writeln and ignoring broken pipes? Something like:


@@ -39,7 +41,16 @@ fn run_search(directory: &Path, query: &str) -> tantivy::Result<()> {
             let doc_id = scorer.doc();
             let doc = store_reader.get(doc_id)?;
             let named_doc = schema.to_named_doc(&doc);
-            println!("{}", serde_json::to_string(&named_doc).unwrap());
+            if let Err(e) = writeln!(
+                io::stdout(),
+                "{}",
+                serde_json::to_string(&named_doc).unwrap()
+            ) {
+                if e.kind() != ErrorKind::BrokenPipe {
+                    eprintln!("{}", e.to_string());
+                    process::exit(1)
+                }
+            }
             scorer.advance();
         }
     }
fulmicoton commented 3 years ago

That seems sensible. Could you send a PR?

fulmicoton commented 3 years ago

An alternative implementation would be to take the stdout lock outside of the loop once and for all. Can you also test whether it has an impact on performance (with LZ4 and the index in page cache)?

If it does not, let's stick to not taking the lock explicitly.

erichutchins commented 3 years ago

I made a small PR for this issue following the examples from here: https://rust-cli.github.io/book/tutorial/output.html#a-note-on-printing-performance

The best practice according to BurntSushi is that you would want line buffering when printing to a tty but block buffering when the output is a file. I don’t know if you want to add any dependencies, but ripgrep has an underlying grep-cli crate with a stdout function that addresses this: https://docs.rs/grep-cli/0.1.5/grep_cli/fn.stdout.html