wofr06 / lesspipe

lesspipe - display more with less
GNU General Public License v2.0
484 stars 51 forks source link

Using "column" for output is not portable #137

Closed harkabeeparolus closed 9 months ago

harkabeeparolus commented 1 year ago

The latest version of lesspipe no longer works for CSV data on macOS or *BSD, only on recent versions of GNU/Linux. Since a9430efccdd408b33deff2af2f49e955918b521a lesspipe uses the column command with the "-o" flag, which is not portable.

The bug is here:

Compatibility

Unfortunately, "column" is not covered by the POSIX standard, and therefore behaviour varies. BSD and thus macOS versions never had the "-o/--output-separator" flag to begin with.

This flag seems to be specific to the util-linux project, available since version v2.23 (from 2013).

Solutions

You could test for the availability of this flag by trying to run something like:

column -V 2>&1 | grep -w util-linux
# or
column --help 2>&1 | grep -w output-separator

You could also keep csvlook as the first option when available, as it was before a9430efccdd408b33deff2af2f49e955918b521a.

wofr06 commented 1 year ago

Fixed by not using option -o. Corrected in release 2.10

harkabeeparolus commented 9 months ago

I would like to state for the record that I'm not happy with this solution. Since no version of column actually understands CSV, it frequently misinterprets my CSV files that have quoted commas in some of the fields. The more you work with CSV files, the more you need an actual parser. ☹️

I was much happier with csvlook as a first option, if installed. If you believe that it's too slow, then qsv table would also work great if it is installed... qsv is written in Rust and super fast. Another pretty option is rich --csv - which certainly looks the best.

Might we discuss this? Or do you have any recommendations for how to override certain file types with one's own personal preferences, without re-inventing mailcap or mimetypes? 😊

wofr06 commented 9 months ago

I do agree that the column command is not the best solution. But it has the advantage that it tolerates badly formed csv files. Both csvlook and pandoc do stop on errors without a chance to continue. pandoc also has problems with quoted fields. The proposed qsv table does not automatically determine the delimiter. The rich --csv command does indeed the best job, it tolerates errors, looks nice and handles alternate delimiters. The only problem with it is that it is not very common (yet), missing in repositories at least on Linux and hard to find because of its name. I could changelesspipe.sh to try the csv parsers rich, csvlook, column and pandoc in turn. I could have a look into csvlook whether the hard errors could be avoided. Meanwhile an option could be to use ~/.lessfilter to define your own csv handling. Other opinions?

harkabeeparolus commented 9 months ago

I did not consider column being more robust with malformed CSV input. That's a very good point.

I also didn't know about the lessfilter option. I apologize for not familiarizing myself more with this project. Since I'm pretty fluent in bash, that solves my personal problems at least. Thanks for the hint!

I'll think more about the robustness angle, and I'll also take a look at what csvkit could do.

wofr06 commented 9 months ago

I am experimenting with a self written parser based on the perl Module Text::CSV. It can be made quite error tolerant. When I finish my tests I can present the code. The problem is however similar, Text::CSV is not in core perl and that is a difficulty for unexperienced people to install it. The module is certainly similar to csvkit, which is the underlying package of csvlook.

harkabeeparolus commented 9 months ago

Indeed. I could probably make a similar program in Python using the standard library csv module. It has built in sniffing to detect delimiters and quoting. I don't even think one would need to install third party modules. And if it's not able to detect the CSV flavor, just print it as plain text.

But how are you thinking about this code, in relation to lesspipe? As an extra tool for those who are interested?

wofr06 commented 9 months ago

Yes, both a modified csvlook and my script could be provided for people experiencing problems with one of the offered alternatives and for the time being be engaged in a lessfilter. The perl module does quoting properly, but I had to do the guessing of delimiters myself, which seems to be quite robust.

harkabeeparolus commented 9 months ago

Here's a proof of concept in Python, using only the standard library, without any third-party modules: https://gist.github.com/harkabeeparolus/aae10da864b20df15d406e453caf00ba

wofr06 commented 9 months ago

Here is a perl version that survives somewhat more pathological csv files: https://gist.github.com/wofr06/5d2193343d9e2672a6309516c403730c

wofr06 commented 9 months ago

The csvlook to view csv files has been reintroduced. A new more fault-tolerant utility csvtable has been created to overcome some deficiencies of csvlook and will be used in the first place if installed and working. It is written in perl, requires the perl module Text::CSV and is available on https://github.com/wofr06/csvtable