racket / db

Other
20 stars 17 forks source link

Feature request: add multiple statements to a database #15

Open nxg opened 4 years ago

nxg commented 4 years ago

The query-exec function will execute only a single SQL statement. There seems no way that I can see of adding multiple statements at once. Specifically, I would hope that something like the example below would work.

The use-cases for this would be reloading a dump of data statements, or for loading DDL statements in the context of creating a fresh DB for unit tests.

Having query-exec accept multiple statements would be nice. An alternative API is the prepare function in Chicken, which has the signature

database sql-string -> sql-statement sql-string

which consumes one SQL statement from the given string, and returns the prepared statement and the rest of the string (I think it would be nicer if the second argument was a port). If the underlying API(s) is firmly constrained to accept only a single statement at a time, then a very simple SQL parser, which need only do just enough to break out the statements within a port, would be a suitable alternative.

The fact that the error below is multiple statements given indicates that the library can detect the presence of multiple statements. It would be attractive if it could do more with this than throw an error.

The only workaround I can see is to use some other means (sed works) to break up a .sql file into a list of strings. But that's not pretty.

$ cat test.sql
-- leading comment
create table testtable(  -- comment
  number integer
  );

insert into testtable values(1);
insert into testtable values(2);
$ cat test.rkt
#lang racket/base

(require db
         racket/port)

(let ((db (sqlite3-connect #:database "test.sqlite"))
      (stmts (call-with-input-file "test.sql" port->string)))
  (query-exec db stmts)
  (for/list ((n (in-query db "select * from testtable")))
    n))
$ touch test.sqlite
$ racket test.rkt
query-exec: multiple statements given
  value: "-- leading comment\ncreate table testtable(  -- comment\n  number integer\n  );\n\ninsert into testtable values(1);\ninsert into testtable values(2);\n"
  context...:
   /Data/LocalApplications/racket/7.4/collects/db/private/sqlite3/connection.rkt:225:4: prepare1* method in connection%
   /Data/LocalApplications/racket/7.4/collects/db/private/sqlite3/connection.rkt:154:4: check-statement method in connection%
   /Data/LocalApplications/racket/7.4/collects/db/private/sqlite3/connection.rkt:81:4: query1 method in connection%
   /Data/LocalApplications/racket/7.4/collects/db/private/generic/../../../racket/private/more-scheme.rkt:261:28
   /Data/LocalApplications/racket/7.4/collects/db/private/generic/functions.rkt:148:0: query-exec
   "/checkouts/itm/luggage/sql/test.rkt": [running body]
   temp37_0
   for-loop
   run-module-instance!125
   perform-require!78
$ 
nxg commented 4 years ago

Looking at .../sqlite3/ffi.rkt, I see that sqlite3_prepare_v2 deliberately returns tail? rather than tail, so I'm not suggesting you're not aware of the issue. Also, it would be relatively straightforward to create, in .../sqlite3/connection.rkt a prepare1*/tail, say, with a string -> statement string signature.

However, looking at Sect 53.2.2.1 of the Postgres manual, I see that it deals with this in a different way, by allowing queries to contain multiple statements. And the MySQL analogue appears to require only a single statement, with no way of breaking apart an unparsed tail. So I do appreciate that the issue here is probably one of finding a model that works with all of the various back-ends.

A crude pre-parse of an SQL multi-statement string is not particularly pretty, but would seem to deal with all of the cases.

rmculpepper commented 4 years ago

As you said, the problem is in making it work on multiple systems.

To add to the complexity you've already mentioned: the PostrgreSQL binding doesn't use the Simple query protocol; it only uses the Extended protocol, which doesn't support multiple statements. But the Extended protocol has a nice binary format for values, whereas the Simple protocol uses a textual format that would require a different (and complicated) set of parsers. So supporting both would be impractically complicated.

Parsing the SQL (for 3+ dialects) in Racket would be a large amount of work to get right, and I don't want to add something only-mostly-kind-of-right.

For the testing use case, since you have control over the SQL, you can pick a simple approximation to parsing (eg, just split on semicolons, if you never have semicolons within string constants, for example) that works locally. For loading bulk data, this library probably isn't a good fit, and you should probably use the database system's specialized command. (For example, PostgreSQL has COPY IN which is better for bulk data loading than many INSERTs, but this library doesn't support COPY IN.)

It would be possible to add a public method only for sqlite3 connections that prepares the first statement from a string and returns the tail (or maybe returns a stream of statements, to avoid repeated allocation).

nxg commented 4 years ago

Righto – it seems clear that there's no way of getting the underlying SQL system to do the multi-statement parsing in a portable way.

I take your point about bulk data – I think the unit-test use-case is the more realistic one (though that could include a few INSERT statements).

Thinking instead about a pre-parse, I believe this wouldn't have to be complicated, and need only be the ‘simple approximation to parsing’ that you mention. Hence, I've knocked together, and attached, a module which does such a pre-parse of a port, detecting only comments, semicolons, strings and ‘other SQL text’, and evaluating to a list of single-statement strings. Because it knows almost nothing about SQL syntax, it should be supremely portable.

nxg commented 4 years ago

(...and of course I forgot a test case: extract-sql.rkt)

srcreigh commented 2 years ago

thanks @nxg , I am using your module with some helper functions to run a directory of SQL migration files on a 'memory SQLite db. I'm planning to use it to test some db code

Here's some of my helper functions

(require db sfri/26 "extract-sql.rkt")

(define (query-exec* conn . stmts)
  (for ([stmt stmts])
    (query-exec conn stmt)))

(define (query-exec-file conn path)
  (apply query-exec* conn
         (call-with-input-file path sql0-port->statements)))

(define (query-exec-all-files conn dir-path)
  (define filepaths
    (sort (map (cut build-path dir-path <>)
               (directory-list dir-path))
          path<?))
  (for ([filepath filepaths])
    (query-exec-file conn filepath)))
winny- commented 1 year ago

Is there an easy way to load multiple statements from a .sql file into a database? I'm curious about bulk loading from Racket without parsing all the SQL. Just stuff it down the pipe? I think PostgresSQL supports multiple statements stuffed directly over the wire, if I understand the docs correctly.

srcreigh commented 1 year ago

@winny- maybe try writing a shell one liner to execute your sql, then call that from Racket using subprocess

https://docs.racket-lang.org/reference/subprocess.html

winny- commented 1 year ago

@winny- maybe try writing a shell one liner to execute your sql, then call that from Racket using subprocess

Definitely could do the trick. For now here's my approach:

(define/contract (sql->statements sql)
  (-> string? (listof string?))
  (regexp-split #px"\\s*;\\s*" sql))

I spotted this in the postgres docs. Not sure if this means it's technically possible to eschew parsing SQL while supporting multiple statements:

55.2.2.1. Multiple Statements in a Simple Query

When a simple Query message contains more than one SQL statement (separated by semicolons), those statements are executed as a single transaction, unless explicit transaction control commands are included to force a different behavior. For example, if the message contains

https://www.postgresql.org/docs/current/protocol-flow.html#id-1.10.6.7.4