shshemi / tabiew

A lightweight TUI application to view and query tabular data files, such as CSV, TSV, or parquet.
MIT License
541 stars 14 forks source link

Compressed files support. #18

Closed benoit-intrw closed 2 weeks ago

benoit-intrw commented 3 weeks ago

Hello,

First thank you for this nice tool which look promising !

I use compressed jsonl files and tried to open them with tw.

$ tw data.jsonl.gz
thread 'main' panicked at /[...]/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tabiew-0.7.0/src/main.rs:37:39:
cannot read compressed CSV file; compile with feature 'decompress' or 'decompress-fast'
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fish: Job 1, 'tw data/.jsonl.gz' terminated by signal SIGABRT (Abandon)

I have tried, as suggested, to build tw with a decompress feature added to polars dependencies ; the file is opened but the the format detection don't work.

$ ./target/release/tw data.jsonl.gz 

If I add -f jsonl as option I got the following error.

$ ./target/release/tw -f jsonl data.jsonl.gz
thread 'main' panicked at src/main.rs:41:39:
stream did not contain valid UTF-8 at line 0
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fish: Job 1, './target/release/tw -f jsonl data.jsonl.gz' terminated by signal SIGABRT (Abandon)

This could be a nice feature addition.

shshemi commented 3 weeks ago

Hi, I'm glad that you like Tabiew. This is a good feature. However, it raises a couple of concerns for me. The most important one is that Polars is not used for all of Tabiew's supported input formats. For instance, reading FWF files involves using another crate. Furthermore, I'm planning to add Avro, if possible, in the future, which is also not available in Polars. Furthermore, Tabiew will support reading from stdin in the next version, which allows the user to decompress the file using any other tool and pipe it via the "|" operator to Tabiew.

benoit-intrw commented 2 weeks ago

Thank you for your answer, these are fair points.

Using zcat to pipe decompressed data into tw effectively works.

zcat data.jsonl.gz | ./target/release/tw -f jsonl

Does tw consume data from the pipe as needed like less or will read all the data ?

shshemi commented 2 weeks ago

It reads stdin to the end.

benoit-intrw commented 2 weeks ago

Thank you for your answer.