csvsql: specify --query multiple times

badbunnyyy commented 2 years ago

Thank you for cvskit. I find it very useful.

I propose an enhancement to csvsql.py. It would be nice to be able to specify --query multiple times.

Please see the attached diff for my modifications. They work well for me and don't break anything.

Being able to specify multiple queries would be nice to eg. preprocess the data with sql commands in a first file, then run your ad-hoc query from the command line, followed by sql commands from a second file to output the data in a specific way.

It would also be useful for dynamically building processing chains from a script and many other scenarios.

The alternatives would be to have a wrapper script gather the sql commands from various sources and feed them into csvsql either as a very long command line or as a temporary file. Both would be cumbersome and prone to breaking.

Another alternative would be to invoke csvsql multiple times in a pipeline. This would also be more inconvenient from the command line and less efficient, as each step would need to output the data, followed by the next step reading and recreating the database.

Please note that already now, --query can be specified multiple times. However, only the last query is processed and all the previous queries are silently discarded. This is very unintuitive.

csvsql.py.diff.txt

Thank you for considering my proposal.

jpmckinney commented 2 years ago

@badbunnyyy Thank you for the patch! Can you create a pull request on GitHub with the changes?

jpmckinney commented 2 years ago

Closed by #1166

wireservice / csvkit

csvsql: specify --query multiple times #1160