mithrandie / csvq

SQL-like query language for csv
https://mithrandie.github.io/csvq
MIT License
1.5k stars 65 forks source link

Generate SQL output #46

Closed derekmahar closed 3 years ago

derekmahar commented 3 years ago

Please add SQL as an output format to serve as input to SQLite and possibly other relational database engines like PostgreSQL and MySQL. The output for a given CSV file would begin with a CREATE TABLE statement followed by one or more INSERT INTO statements to recreate the contents of the CSV file as a database table in SQLite.

mithrandie commented 3 years ago

Those DBMS have the feature to import csv. Why do you need it?

derekmahar commented 3 years ago

Yes, you are correct. However, I am considering the case where you might generate CSV content in a pipeline that could terminate with sqlite3. Unfortunately, we cannot implement such a pipeline because sqlite3 accepts only SQL statements on standard input and csvq does not generate SQL output. As a consequence, we must first save the CSV content to a temporary file or FIFO and run a separate SQLite script to import the CSV file. It would be more convenient to output the CSV content as SQL INSERT INTO statements and pipe these directly into sqlite3, bypassing the creation of a temporary file or FIFO and the creation and execution of the sqlite3 import script.

mithrandie commented 3 years ago

I don't know why you don't want to use temporary files or fifo, but anyway, if you're using unix or unix-like os, can't you specify /dev/stdin as the import file for sqlite?

derekmahar commented 3 years ago

I think that won't work because sqlite3 accepts only SQL scripts (SQL statements and its own shell commands) on standard input. How then could sqlite3 accept both the SQL script and CSV content on standard input?

mithrandie commented 3 years ago

At least on my environment, macOS 11 and sqlite3 version 3.32.3, I can run it like this.

Screen Shot 2021-03-13 at 1 58 24 (1)

derekmahar commented 3 years ago

Yes, you are right. I just did the same thing. I hadn't realized that when provided a script as an argument, sqlite3 would not process its standard input as a script and could instead read and process other input data. It's also interesting that in the command line script, command .quit need not follow .import which is the case when you supply the script on standard input.

derekmahar commented 3 years ago

Do psql (PostgreSQL) and mysql (MySQL) also have this behaviour?

derekmahar commented 3 years ago

How would one implement this pipeline in Windows?

derekmahar commented 3 years ago

I don't know why you don't want to use temporary files or fifo

Using temporary files is more verbose and cumbersome than simply chaining together all commands in a pipeline.

mithrandie commented 3 years ago

I can't test mysql or sql right away, but you can execute mysql or sql commands directly as you can with sqlite. And you probably can't do the same thing on Windows.

In any case, using temporary files is not a complicated task, and if you need to do it over and over again, you can just write a simple script.

derekmahar commented 3 years ago

And you probably can't do the same thing on Windows.

Yes, Windows does not seem to have a special file name for standard input equivalent to /dev/stdin or - in Unix.

In any case, using temporary files is not a complicated task, and if you need to do it over and over again, you can just write a simple script.

Yes. For example, I just wrote a bash script called csv_to_sqlite:

#!/bin/bash
database_file_name=$1
table_name=$2
temp_input_file_name=<(cat)
sqlite3 $database_file_name ".import --csv $temp_input_file_name $table_name"

Note that <(cat) invokes cat which reads from the standard input and redirects its standard output to a temporary file which the system creates.

Test:

$ printf "col1,col2\na,1\nb,2\n" | ./csv_to_sqlite test.db test
$ sqlite3 -csv -header test.db "SELECT * FROM test"
col1,col2
a,1
b,2
mithrandie commented 3 years ago

If we would implement features for external applications, we will need to keep up with their updates. That will require a lot of work. Since there are widely used formats such as csv, it is better to think about how to pass them on in each individual situation.

derekmahar commented 3 years ago

Yes, I agree, though in the case of generating SQL output to import a CSV, only the CREATE TABLE statement might differ between database implementations. The INSERT INTO statements would likely be standard.

derekmahar commented 3 years ago

You could just generate the INSERT INTO statements.

mithrandie commented 3 years ago

Enclosures of tables and fields may be backquotes, or it may be double-quotes. Depending on the number of data, bulk inserts may be required for actual use, but it is not supported by some databases. It is not easy to cover everything that varies from database to database and user environment to user environment.

In the first place, if create table statements don’t be exported, you will end up needing to work on the database separately.

derekmahar commented 3 years ago

Okay, I concede that generating SQL statements is complicated by differing database SQL conventions and that it makes sense to instead use an import tool specific to the database. The only case I can think of where a generic tool that generates standard SQL statements would be useful would be where a database didn't have a tool to import a CSV file. SQLite, PostgreSQL, and MySQL can all import CSV files into tables, though.

derekmahar commented 3 years ago

Note that <(cat) invokes cat which reads from the standard input and redirects its standard output to a temporary file which the system creates.

Unfortunately, while <(cat) works in bash on FreeBSD, it does not work in Linux. I'm still seeking a solution that doesn't require directly reading /dev/stdin.

derekmahar commented 3 years ago

The following version works in Linux and FreeBSD:

#!/bin/bash

function csv_to_sqlite() {
  local database_file_name="$1"
  local table_name="$2"
  local temp_input_fifo=$(mktemp -u)
  mkfifo $temp_input_fifo
  sqlite3 -csv $database_file_name ".import $temp_input_fifo $table_name" &
  cat > $temp_input_fifo
  rm $temp_input_fifo
}

database_file_name=$1
table_name=$2

csv_to_sqlite "$database_file_name" "$table_name"
derekmahar commented 3 years ago

I found a simpler solution that still uses sqlite3 .import, but that doesn't read /dev/stdin or a temporary named pipe. Instead, it uses .import with the pipe operator to invoke cat - to read directly from standard input.

#!/bin/bash

function csv_to_sqlite() {
  local database_file_name="$1"
  local table_name="$2"
  sqlite3 -csv $database_file_name ".import '|cat -' $table_name"
}

database_file_name=$1
table_name=$2

csv_to_sqlite "$database_file_name" "$table_name"