jwills / buenavista

A Postgres Proxy Server in Python
Apache License 2.0
232 stars 24 forks source link

add docs showing usage of container and psql shell using duckdb's sqlite and parquet scanners #10

Closed mskyttner closed 1 year ago

mskyttner commented 1 year ago

Adds usage example with containerized psql shell connecting to buenavista (container from ghcr.io registry) and shows making use of duckdb's sqlite and parquet scanners. Two small example data files are therefore added, 'iris.parquet' and 'sakila.db'.

jwills commented 1 year ago

@mskyttner hey, thanks for sending this my way! The iris.parquet file I don't mind, but I feel a little guilty about the 5MB DuckDB file b/c I'm old-school like that (and b/c the duckdb file format keeps changing, so we're going to have to keep updating that file so it'll work)-- can we shrink that down at all? Or find some other strategy for pulling it in here?

mskyttner commented 1 year ago

That file sakiladb is an sqlite3 database, less prone to changing fileformat currently since sqlite3 has a more stable file format at least until duckdb finalizes its own. The sakiladb file is available in the duckdb sqlite extension repo and elsewhere.

Could be pulled with a script instead of included as a binary blob in this repo of course, would you prefer that?

Another sqlite example database is "chinook.db", which seems smaller at 864K, yet providing a fairly rich schema beyond a single table.

In addition to reading data from sqlite and parquet, it would be nice to give a quick, small example of querying remote CSV data from http(s)... but in duckdb the autodetection of column types is still slightly quirky ... and I find it too tedious to specify the datatypes manually for each column ;).

I'm adding a download script for example data and a switch from sakiladb to the smaller chinook.db.

jwills commented 1 year ago

Thank you @mskyttner !