multiprocessio / dsq

Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.
Other
3.71k stars 152 forks source link

sqlite3 one-liner(ish) for benchmarking #70

Closed simonw closed 2 years ago

simonw commented 2 years ago

Follows #69 - the README at https://github.com/multiprocessio/dsq/tree/b7af2679038f8b09802c88f2f088bcc78fa8872c#benchmark says sqlite3 can't do one-liners, but it sort-of-can - this works for running the taxi query:

sqlite3 :memory: <<EOS
.mode csv
.import taxi.csv taxi
SELECT passenger_count, COUNT(*), AVG(total_amount) FROM taxi GROUP BY passenger_count
EOS
simonw commented 2 years ago

On my MacBook Pro (2.6Ghz 32MB 2019) it takes 12.4s using time, I've not tried with a proper benchmark.

eatonphil commented 2 years ago

Yeah I just am not considering that a reasonable one-liner. Open to other wording but the popularity of all these tools kind of demonstrates how that is not what everyone is settling for.

simonw commented 2 years ago

This works too:

sqlite3 :memory: -cmd '.mode csv' -cmd '.import taxi.csv taxi' 'SELECT passenger_count, COUNT(*), AVG(total_amount) FROM taxi GROUP BY passenger_count'

eatonphil commented 2 years ago

That's a good one! And wow it's 4s on my machine. Adding to the list.

eatonphil commented 2 years ago

Closed in https://github.com/multiprocessio/dsq/commit/8f68cde53017981de15f5e130e3a5575d5f135b4

simonw commented 2 years ago

Turned this into a TIL: https://til.simonwillison.net/sqlite/one-line-csv-operations - I just found out you can output in different formats too, e.g.:

sqlite3 :memory: -cmd '.mode csv' -cmd '.import taxi.csv taxi' -cmd '.mode markdown' \
    'SELECT passenger_count, COUNT(*), AVG(total_amount) FROM taxi GROUP BY passenger_count'

Which outputs markdown you can embed in an issue like this:

passenger_count COUNT(*) AVG(total_amount)
128020 32.2371511482553
0 42228 17.0214016766151
1 1533197 17.6418833067999
2 286461 18.0975870711456
3 72852 17.9153958710923
4 25510 18.452774990196
5 50291 17.2709248175672
6 32623 17.6002964166367
7 2 87.17
8 2 95.705
9 1 113.6
jakiestfu commented 2 years ago

Really slick, @simonw

ryenus commented 2 years ago

Meanwhile I have this cq bash function for interactive use:


# Query a csv file with sqlite
# Usage: cq <csv_file> [<table>]
cq() {
  local csv_file="${1:?Usage: cq <csv_file> [<table>]}" table="${2:-T}"
  sqlite3 -interactive -cmd '.headers on' -cmd '.prompt "\nsqlite> "' \
    -cmd '.mode csv' -cmd ".import '$csv_file' '$table'" -cmd ".mode column"
}