Closed Liooo closed 1 year ago
Just because csvtk pretty
needs load all the data. Try to use head -n 1000 huge_file.csv | csvtk pretty | less
.
Thanks for the quick response.
Just because csvtk pretty needs load all the data.
But it doesn't have to, does it?
Try to use ...
I've been doing head
, but everytime it makes me think it'd be much nicer if pipe works out of the box.
But it doesn't have to, does it?
oh to determine the column width, right. Feels like could be worked around by using fixed column width and ellipse-ing the longer texts, when an option is given or something.
Yes, it's on the to-do list. https://github.com/shenwei356/csvtk/issues/206
oh so -W
is already there, then this feature should be ready to be developed, am I correct?
Can't promise the ETA, but would you accept the PR if I made one? say the signature is something like:
csvtk --pipe (or -P) # utilizes unix pipe buffer for large files, uses `-W 10` internally by default
csvtk --pipe -W 30 # when changing the width from default `-W 10`
oh so -W is already there, then this feature should be ready to be developed, am I correct?
not started yet.
206 seems like it's about text wrapping and not really related with pre-determining the column width I assume.
They are related
Here's my plan.
It can be applied to streaming data from the standard input pipe or any file.
If the widths of some columns exceed the pre-determined value, wrap the content to multiple lines.
I think this should be applied only when --wrap
option is specified, otherwise the text should be cut off at -W
length. Often times for readability, we don't want one csv row to span multiple columns.
Hmm, that makes sense. But we need to read the file twice or hold the data in memory (the current way).
Implemented. The output is streaming now, you can pipe to other tools like more
or less
.
Please check here: https://github.com/shenwei356/csvtk/issues/206#issuecomment-1609358555
How to:
1. First -n/--buf-rows rows are read to check the minimum and maximum widths
of each column. You can also set the global thresholds -w/--min-width and
-W/--max-width.
1a. Cells longer than the maximum width will be wrapped (default) or
clipped (--clip).
Usually, the text is wrapped in space (-x/--wrap-delimiter). But if one
word is longer than the -W/--max-width, it will be force split.
1b. Texts are aligned left (default), center (-m/--align-center)
or right (-r/--align-right).
2. Remaining rows are read and immediately outputted, one by one, till the end.
@shenwei356
thanks so much 🚀
When I run
cat huge_file.csv | less
it shows first N results immediately, but when I runcat huge_file.csv | csvtk pretty | less
, it take long to get the output. Probably this is unix pipe buffer sizing thing?