wireservice / csvkit

A suite of utilities for converting to and working with CSV, the king of tabular file formats.
https://csvkit.readthedocs.io
MIT License
5.99k stars 607 forks source link

Feature Request: csvlook to wrap long columns #389

Closed Tabea-K closed 8 years ago

Tabea-K commented 9 years ago

Quite often I have csv files that I want to quickly scan, that have one or two columns with very long strings (i.e. >2000 characters). csvlook will then wrap them over many lines.

What I would suggest is to add an argument with which the user can specify a maximum length for each column to display. If, for example, I set this to 50, then for each row, all columns that contain more than 50 characters are truncated.

At first, I thought that the option -z would do exactly that, but apparently it doesn't.

mkauzlar commented 9 years ago

I vote for this too. I tried to use the -z option the same way before realising the mistake. It seems that the -z option only checks if fields are larger than x.

If possible I would suggest to add 2 options:

  1. Truncate the text in a column: If a text in a column exeeds say 10 then is truncated at 10

Original Output:

-----------------------------------------+------------------+--------------+
column1                                           column2          column3
-----------------------------------------+------------------+--------------+
aaaaaaaaaaaaaaaaaaaaaaaaa        bbbbb             cccc
-----------------------------------------+------------------+--------------+

Column1 Truncated at 10 Output:

-------------------+------------------+--------------+
column1                 column2          column3
-------------------+------------------+--------------+
aaaaaaaaaa          bbbbb             cccc
-------------------+------------------+--------------+
  1. Wrap the text in a column row: If a text in a column exeeds say 10 than text is wrapped at lenght 10:

Original Output:

-----------------------------------------+------------------+--------------+
column1                                           column2          column3
-----------------------------------------+------------------+--------------+
aaaaaaaaaaaaaaaaaaaaaaaaa        bbbbb             cccc

Column1 text Wrapped at 10 Output:

-------------------+------------------+--------------+
column1                 column2          column3
-------------------+------------------+--------------+
aaaaaaaaaa          bbbbb             cccc
aaaaaaaaaa
aaaaa
-------------------+------------------+--------------+

Regards

themiurgo commented 9 years ago

Needed wrapping mode today, so I came up with this implementation. Let me know how to improve it so that it can be included.

https://github.com/themiurgo/csvkit/commit/aaf18e9355be4d93b919cc4fe68627c07bccb240

mkauzlar commented 9 years ago

themirugo: works as requested thanks.

An improvement would be to be able to set wrap length per column. We have now a perl script that does that: cat test5.csv | perl col.pl -s +20,5,10,-5,-20 so if the csv has 5 columns each column can be sized with a number and + or - in front means that text is aligned left or right. Here is an example output:

EDIT: (unfortunately there is no support for fixed width so it doesn't look good pasting the result here)

With the above features one can format the output table to be nice and readable without being too wide for some screens.

themiurgo commented 9 years ago

@mkauzlar That wouldn't be hard to implement that in csvlook. However specifying the charwidth for each columns seems overkill for most of the situations and complicates the CLI. From what I've understood csvlook is not meant to be the ultimate csv viewer but just a "good enough" for most use cases. Providing a max width for all the columns seems a good trade off between ease of use and functionality. Let's hear @onyxfish's opinion on this, perhaps he can provide a good solution.

themiurgo commented 9 years ago

I'm not even sure that '-w / --wrap' is the right flag for this. As an alternative, csvlook might use of the -z flag for this. Right now it just raises an error when a field is over a certain width.

themiurgo commented 9 years ago

Needed this again today. I'll send a PR.

synapticarbors commented 9 years ago

+1 for the ability to set a max width to truncate a column at

pesterhazy commented 8 years ago

It would be very useful just to have a setting to set a max width for each column, so that a single wide value doesn't make the whole file unreadable.

themiurgo commented 8 years ago

Setting a single max value for all the columns seems a good trade off between having a complicated CLI and being able to quickly view datasets with long fields.

By the way, I don't think this PR will be processed quickly / at all. The project seems to have a backlog of PRs dating back to a year ago.

Tabea-K commented 8 years ago

I created pull request #482 to fix this

themiurgo commented 8 years ago

@Tabea-K, there is already an older PR that fixes this by wrapping #429 . However the maintainer(s) seem to be busy at the moment.

jpmckinney commented 8 years ago

The implementation of this functionality is moving out of the csvkit repository into its new dependency agate (#515). This is the file that would need to be modified. I've opened an issue there referring to all issues and PRs related to this feature request.

jpmckinney commented 8 years ago

Re-opening as this can now be implemented thanks to the issue I opened on agate being fixed.

themiurgo commented 8 years ago

The agate commits just introduced truncation, not wrapping.

onyxfish commented 8 years ago

@themiurgo I apologize for your pull request not getting due attention back when it was fresh. As you noticed this project has been on ice for a while as the backend was being reimplemented as agate. That being the case, I would merge a wrapping pull request for agate (eg. table.print_table(max_column_width=20, wrap_columns=True)), if you feel like reimplementing it there.

The one caveat I have is this seems like a fairly unusual case and I don't want it at the cost of making the implementation tremendously complicated. If there is a way to implement it that's straightforward then great.

jpmckinney commented 8 years ago

Ah, sorry, I had reopened this based on the issue title ("truncate") not the description ("wrapping"). Re-closing as before.

jpmckinney commented 8 years ago

I'll re-open a new issue just for truncating: https://github.com/onyxfish/csvkit/issues/561

jugglinmike commented 1 year ago

Hey folks, I'm having trouble understanding the resolution of this feature request. The referenced pull requests concern the "agate" dependency, but I can't find any patches which implement the functionality here in csvkit. Neither the csvlook documentation nor the csvkit's changelog mention wrapping, either. Can csvkit format output so that individual cells are displayed across multiple lines?

jpmckinney commented 1 year ago

There's no wrapping, only truncation. To avoid confusion, open a new issue on agate.

jugglinmike commented 1 year ago

Got it. Here's the issue https://github.com/wireservice/agate/issues/773. Thanks for the quick response!