Explore: tpretty implementation in awk?

slothai / tabtools

🔧 SQL for csv file in UNIX command line with awk.

http://tabtools.readthedocs.io/

Other

16 stars 3 forks source link

Explore: tpretty implementation in awk? #13

Open pavlov99 opened 4 years ago

pavlov99 commented 4 years ago

At the moment pretty table print is implemented in python. The program reads the whole input twice: one to calculate column widths and the second time to actually print.

Is it possible to implement such functionality in bash with reasonable limitations (e.g. header manipulation)?

See https://superuser.com/questions/557256/reading-the-same-stdin-with-two-commands-in-bash

command1 source | tee >(command2) >(command3)

https://stackoverflow.com/questions/10218103/os-x-linux-pipe-into-two-processes

echo "Leeroy Jenkins" | tee >(md5sum > out1) >(sha1sum > out2) > out3

pavlov99 commented 4 years ago

This was implemented in https://github.com/slothai/tabtools/commit/678e72894b1fb7b0120cbb07e05d65a369bce884 commit.

pavlov99 commented 4 years ago

Performance comparison: python implementation vs awk implementation.

Compare head of the file (common pretty printing case), 350 lines files and >1k lines file.

Python vesion

cat file | time -f '%es' .env/bin/python -c 'from tabtools.scripts import *; ttpretty()'

Awk version

cat file | time -f '%es' ./bin/ttpretty

File	Python Code	Awk Code
7 columns, 10 rows (head)	0.04s	0.01s
7 columns, 338 rows	0.05s	0.02s
14 columns, 10k rows	0.40s	0.72s

pavlov99 commented 4 years ago

After script rewriting, it outperforms python version:	File	Python Code
7 columns, 10 rows (head)	0.04s	0.01s
7 columns, 338 rows	0.05s	0.02s
14 columns, 10k rows	0.42s	0.38s