slothai / tabtools

🔧 SQL for csv file in UNIX command line with awk.
http://tabtools.readthedocs.io/
Other
16 stars 3 forks source link

Explore: tpretty implementation in awk? #13

Open pavlov99 opened 4 years ago

pavlov99 commented 4 years ago

At the moment pretty table print is implemented in python. The program reads the whole input twice: one to calculate column widths and the second time to actually print.

Is it possible to implement such functionality in bash with reasonable limitations (e.g. header manipulation)?

See https://superuser.com/questions/557256/reading-the-same-stdin-with-two-commands-in-bash

command1 source | tee >(command2) >(command3)

https://stackoverflow.com/questions/10218103/os-x-linux-pipe-into-two-processes

echo "Leeroy Jenkins" | tee >(md5sum > out1) >(sha1sum > out2) > out3
pavlov99 commented 4 years ago

This was implemented in https://github.com/slothai/tabtools/commit/678e72894b1fb7b0120cbb07e05d65a369bce884 commit.

pavlov99 commented 4 years ago

Performance comparison: python implementation vs awk implementation.

Compare head of the file (common pretty printing case), 350 lines files and >1k lines file.

Python vesion

cat file | time -f '%es' .env/bin/python -c 'from tabtools.scripts import *; ttpretty()'

Awk version

cat file | time -f '%es' ./bin/ttpretty
File Python Code Awk Code
7 columns, 10 rows (head) 0.04s 0.01s
7 columns, 338 rows 0.05s 0.02s
14 columns, 10k rows 0.40s 0.72s
pavlov99 commented 4 years ago
After script rewriting, it outperforms python version: File Python Code Awk Code
7 columns, 10 rows (head) 0.04s 0.01s
7 columns, 338 rows 0.05s 0.02s
14 columns, 10k rows 0.42s 0.38s