manojkarthick / pqrs

Command line tool for inspecting Parquet files
Apache License 2.0
294 stars 29 forks source link

Directory support in `head` / pipe support #19

Closed Hoeze closed 2 years ago

Hoeze commented 3 years ago

Hi, we're very happily using pqrs now and found two small issues with it:

1) head does not support directories:

#> pqhead data.parquet 
Error: ParquetError(General("underlying IO error: Is a directory (os error 21)"))

2) It panics when used in a pipe:

#> pqcat data.parquet | head

###########################################################################################################################################################################################################
File: data.parquet/d66ac6554cc44c3cbfaa56b75fa446e4.parquet
###########################################################################################################################################################################################################

[...]
thread 'main' panicked at 'failed printing to stdout: Broken pipe (os error 32)', library/std/src/io/stdio.rs:935:9
manojkarthick commented 3 years ago

Hi! Thanks for filing the issue.

  1. That is the expected behaviour for pqrs head which works similar to the head command in *nix, which doesn't traverse directories. I am not sure if head-ing directories should be supported - selecting a file randomly makes it non-deterministic while sorting the files by a property (file name, last modified time, etc) will take a long time if there are lots of files.

  2. The second issue is because of an upstream bug in how sigpipes are handled in Rust. Ref: https://github.com/rust-lang/rust/issues/46016