Closed bendemott closed 1 year ago
Allow users to pass a method/single-method class which can do whatever pre-processing they want on queries (via the Connection constructor). The benefit is then people are free to do what they want and they can also opt into this behaviour instead of this being default behaviour.
Hey @hashhar
The semicolon character is a statement terminator. It is a part of the ANSI SQL-92 standard
I don't think adding a custom hook makes sense, this is part of the sql standard, and any database that claims to implement that standard should support statements terminated with a semicolon.
The custom hook solution doesn't work (easily) for sqlalchemy where you initialize the connection via URI. You're also introducing additional custom configuration for something that should be standard behavior.
Imagine having 1,000 users of jupyter notebooks using Trino and having to train each one to write a custom lambda that properly removes trailing semicolons.
If you don't feel this belongs in the driver I can understand that - Should this be implemented at the engine level - Happy to open a PR there?
From the SQL 92 standard:
X3H2-92-154/DBL CBR-002
20.1 <direct SQL statement>
<direct SQL statement> ::=
<directly executable statement> <semicolon>
Thanks for linking to spec - that provides strong case to support it on the server. The other benefit being that we won't silently start swallowing any additional queries present after the semicolon.
I don't believe this should live in the driver. But if it had to I'd prefer something more generic than just for this one use-case to avoid growing multiple such special handlings over the driver.
cc: @martint
The other benefit being that we won't silently start swallowing any additional queries present after the semicolon.
This implementation is not at risk of that, it will only trim the semicolon when its at the end of the line, and followed by whitespace. The whitespace is only stripped at the RIGHT side of the string.
There is no "clever" logic in the code attempting to perform string parsing, it simply is... Does the statement end with a semicolon and optional whitespace - ok then, remove the semicolon at the very end of the line.
The current state of things is the engine throws up errors when a semicolon is added to the end of a statement. In the new state, even if someone added multiple statements separated by semicolons, the engine would still throw up an error as multiple statements aren't allowed.
I was very careful to ensure it's essentially impossible to "clobber" the query in any way, in addition I even preserve the whitespace at the end of the line, so a user wouldn't be surprised by trailing whitespace being removed from their query.
I'll wait on on comment from Martin.
We have extended our base lexer to support the semicolon terminator. This lexer is used in trino-cli
.
The options to move forward seem to be:
Implement functionality to inject a query preprocessor similar to trino-cli. Note that the pre-processor approach can work with both DB-API and SQLAlchemy. For DB-API we have control of the connection constructor and for SQLAlchemy create_engine
accepts a kwarg creator
which is callable which is used to get the DB-API connection.
Change the server to allow queries terminated with ;
. This seems to be cleanest approach and will benefit all clients instead of the chosen few. However it is also more work and might require more consideration than doing the change in a client.
cc: @martint. PTAL and share your opinion about doing this on the server for all clients to benefit.
It makes sense to support semicolons as statement terminator on the server side -- per the SQL spec, it's supposed to be that way. Note that this doesn't mean that the server would support multiple statements in a single invocation.
However it is also more work and might require more consideration than doing the change in a client.
@hashhar, what do you envision as "more work"? It's likely a single-line change to the SQL grammar in Trino, plus any parser tests we may want to add. We might need to look at the Trino CLI, though, which currently does special handling of semicolons on the client side.
Needs more tests
- two queries separated by semicolons
- a prepared statement for a query including a semicolon
Just wanted to highlight that adding support for semicolons on the server side doesn't mean that it will be possible to run multiple statements in one invocation -- that's a completely separate topic with far-reaching implications.
I'll be opening a PR in Trino to add support for terminating semicolons in the dialect, closing this PR.
What is the status? pandas to_sql fails on both normal connection and sqlalchemy (with different errors)
Naive implementation of removal of optional trailing semicolon. This allows sql statements passed to
cursor.execute()
to contain an optional trailing semicolon.SELECT 1; -- select statement
will be missed.Description
Pandas and several other tools that generate SQL statements and use this driver in a generic way will append semicolon's to the end of their statements. Optional SQL statements are allowed at the end of many DBAPI drivers for other databases.
Example:
In Pandas the following statement will fail with Trino because it adds a trailing semicolon:
To work-around this, the following code is required:
Non-technical explanation
Allows sql statements to contain an optional trailing semicolon.
Release notes
( ) This is not user-visible or docs only and no release notes are required. (x) Release notes are required, please propose a release note for me. ( ) Release notes are required, with the following suggested text: