wireservice / csvkit

A suite of utilities for converting to and working with CSV, the king of tabular file formats.
https://csvkit.readthedocs.io
MIT License
5.98k stars 608 forks source link

Test and document support for DuckDB #1226

Closed zachsa closed 8 months ago

zachsa commented 9 months ago

I like using csvsql to run adhoc SQL queries on a CSV - hearing about DuckDB and looking into it, it seems that would be beneficial in my case. Is it possible to specify DuckDB as the SQL engine, and if not, would it align with this project to support DuckDB as an optional engine?

jpmckinney commented 9 months ago

It should work out of the box, but occasionally new engines have some bugs.

To try, run pip install duckdb-engine and set your connection string to start with duckdb://.

zachsa commented 8 months ago

Thank you - testing this with a file test.csv:

name,age
a,98
b,23
c,78

This works when using the default connection (sqlite):

csvsql  --query "select * from test;" test.csv

However, this does not work using a duckdb connetion string:

csvsql --db duckdb:///test --query "select * from test;" test.csv

I get an error

ProgrammingError: (duckdb.duckdb.ParserException) Parser Error: syntax error at or near "SAVEPOINT"
LINE 1: SAVEPOINT sa_savepoint_1
        ^
[SQL: SAVEPOINT sa_savepoint_1]
(Background on this error at: https://sqlalche.me/e/20/f405)

@jpmckinney please let me know if this is an error or if I'm using the CLI incorrectly.

(this also errors: csvsql --db duckdb:///test --query "SELECT * FROM test" test.csv)

jpmckinney commented 8 months ago

I released a new version of agate-sql that should fix that error. pip install --upgrade agate-sql

zachsa commented 8 months ago

Sorry for continuing on this...

cat data-20231108-20231208.csv | csvsql --db duckdb:///:memory: --query "SELECT * from stdin"

That only selects headers - I thought I would get the content as well. Should that be the case?

jpmckinney commented 8 months ago

It's normal. You probably want --insert.