I propose an enhancement to csvsql.py. It would be nice to be able to specify
--query multiple times.
Please see the attached diff for my modifications. They work well for me and don't break anything.
Being able to specify multiple queries would be nice to eg. preprocess the data with sql commands
in a first file, then run your ad-hoc query from the command line, followed by sql commands from
a second file to output the data in a specific way.
It would also be useful for dynamically building processing chains from a script and many other
scenarios.
The alternatives would be to have a wrapper script gather the sql commands from various sources
and feed them into csvsql either as a very long command line or as a temporary file. Both would
be cumbersome and prone to breaking.
Another alternative would be to invoke csvsql multiple times in a pipeline. This would also be more
inconvenient from the command line and less efficient, as each step would need to output the data,
followed by the next step reading and recreating the database.
Please note that already now, --query can be specified multiple times. However, only the last query
is processed and all the previous queries are silently discarded. This is very unintuitive.
Thank you for cvskit. I find it very useful.
I propose an enhancement to csvsql.py. It would be nice to be able to specify --query multiple times.
Please see the attached diff for my modifications. They work well for me and don't break anything.
Being able to specify multiple queries would be nice to eg. preprocess the data with sql commands in a first file, then run your ad-hoc query from the command line, followed by sql commands from a second file to output the data in a specific way.
It would also be useful for dynamically building processing chains from a script and many other scenarios.
The alternatives would be to have a wrapper script gather the sql commands from various sources and feed them into csvsql either as a very long command line or as a temporary file. Both would be cumbersome and prone to breaking.
Another alternative would be to invoke csvsql multiple times in a pipeline. This would also be more inconvenient from the command line and less efficient, as each step would need to output the data, followed by the next step reading and recreating the database.
Please note that already now, --query can be specified multiple times. However, only the last query is processed and all the previous queries are silently discarded. This is very unintuitive.
csvsql.py.diff.txt
Thank you for considering my proposal.