Closed cpcloud closed 8 months ago
Wow, that was so fast. Thank you so much for prioritziing this!
I think I am slowly getting it to work, I've had to modify the code a little bit to get it to accept the parameters I needed to. These are some very rough changes I made just to brute-force get the parameters through, I'm sure there's a cleaner way to do this.
https://github.com/inigohidalgo/ibis/commit/1805dda7624cb3860331ce7c61570be4d8b2761d
conn_ib = ibis.mssql.connect(
host=None,
port=None,
user=None,
query={"odbc_connect": f"Driver=ODBC+Driver+17+for+SQL+Server;SERVER={server};DATABASE={database}"},
**connect_args
)
where connect_args
are those I defined here #7306
It seems that the odbc_connect
kwarg I first deleted in the script I posted in the discussion is actually making all the strings passthrough to pyodbc directly sqlalchemy docs
EDIT: just to confirm, using the code block I posted above using the version of the code in my branch, I already successfully queried a simple table! 🙌
@inigohidalgo Nice!
I incorporated most of your changes, they look reasonable to me, thank you for that!
I'm not a huge fan of passing
f"SERVER={server};DATABASE={database}"
as a string in the query field when ibis' API already offers those arguments as part of the connect
functions. Those arguments get passed to self._build_alchemy_url
but the output of that call returns the wrong format for pyodbc connections.
@inigohidalgo What does your ideal call to ibis.connect
(or ibis.mssql.connect
) look like? Maybe there's a way we can improve it.
I'm going to have a look around because from what I see I should be able to get the default _build_alchemy_url
to work with the host and database parameters: https://stackoverflow.com/a/46101255/9807171
EDIT: although my host looks like "XXX-XXX-XXX.database.windows.net" instead of just a single "host", not sure if that makes a difference.
Yeah I'm not able to connect without passing "SERVER={server};DATABASE={database}"
inside the raw string.
In my ideal call, the host and database parameters would be added to that query={"odbc_connect": f"Driver=ODBC+Driver+17+for+SQL+Server;SERVER={server};DATABASE={database}"}
but that seems like a very specific implementation just for my usecase, and it's adding maintenance burden to ibis instead of being handled by sqlalchemy. I'm trying to see if there's a URL builder in SQLAlchemy which handles this but I'm not seeing it anywhere.
I'm not sure we can special case odbc_connect
usage since it seems there are many ways to connect with odbc. In CI for example, we're using a combination of odbc driver parameters and user/host/password etc.
Absolutely, it wouldn't make any sense to have such a specific implementation, especially when it seems it SHOULD work from what I'm reading online.
According to this SQLAlchemy should be converting the connection string to the "SERVER={server};DATABASE={database}" but it isn't working.
Also apparently SQLAlchemy adds a parameter Trusted_Connection=Yes which I'm supposed to remove, but I'm having issues understanding where I'm supposed to remove it
@event.listens_for(engine, "do_connect")
def provide_token(dialect, conn_rec, cargs, cparams):
# remove the "Trusted_Connection" parameter that SQLAlchemy adds
cargs[0] = cargs[0].replace(";Trusted_Connection=Yes", "")
# create token credential
raw_token = azure_credentials.get_token(TOKEN_URL).token.encode("utf-16-le")
token_struct = struct.pack(f"<I{len(raw_token)}s", len(raw_token), raw_token)
# apply it to keyword arguments
cparams["attrs_before"] = {SQL_COPT_SS_ACCESS_TOKEN: token_struct}
At worst since all our connections are going to be done through my kedro wrapper, I can handle this on my side, but that would make that wrapper implementation less of a general Ibis wrapper and more of an Ibis+Azure one.
I think to attach that event listener you'd do this:
con = ibis.mssql.connect(...)
@event.listens_for(con.con, "do_connect")
def provide_token(dialect, conn_rec, cargs, cparams):
...
That did the trick!
import struct
from sqlalchemy import create_engine, event
from sqlalchemy.engine.url import URL
from azure import identity
import ibis
SQL_COPT_SS_ACCESS_TOKEN = 1256 # Connection option for access tokens, as defined in msodbcsql.h
TOKEN_URL = "https://database.windows.net/" # The token URL for any Azure SQL database
connection_string = "mssql://XXX.database.windows.net/XXX?driver=ODBC+Driver+17+for+SQL+Server"
conn = ibis.connect(connection_string)
azure_credentials = identity.DefaultAzureCredential()
@event.listens_for(conn.con, "do_connect")
def provide_token(dialect, conn_rec, cargs, cparams):
# remove the "Trusted_Connection" parameter that SQLAlchemy adds
cargs[0] = cargs[0].replace(";Trusted_Connection=Yes", "")
# create token credential
raw_token = azure_credentials.get_token(TOKEN_URL).token.encode("utf-16-le")
token_struct = struct.pack(f"<I{len(raw_token)}s", len(raw_token), raw_token)
# apply it to keyword arguments
cparams["attrs_before"] = {SQL_COPT_SS_ACCESS_TOKEN: token_struct}
conn.table(name="INPUT_VARIABLES_CONN_PT", schema="IGD").to_pandas()
One issue I have run into is that I'm having to monkeypatch query["driver"] = "ODBC Driver 17 for SQL Server"
inside the mssql.Backend.do_connect
because it is being lost from the connection_string
at some point.
Just a heads up I'm gonna be quite busy for the next 24-48h so I'll be a little bit less responsive.
One issue I have run into is that I'm having to monkeypatch query["driver"] = "ODBC Driver 17 for SQL Server"
I think you should be able to write
conn = ibis.connect(
"mssql://XXX.database.windows.net/XXX",
query=dict(driver="ODBC+Driver+17+for+SQL+Server"),
)
Any **kwargs
you pass to connect
should be forwarded on to do_connect
.
Can you give that a try?
Given the complexity of using tokens, I think it makes sense to add a token_provider
parameter to the MSSQL backend's do_connect
so that you can define your token provider callable without having to manage sqlalchemy events or depend on creating an ibis backend first.
That would make the connection code look like this:
import struct
from sqlalchemy import create_engine, event
from sqlalchemy.engine.url import URL
from azure import identity
import ibis
SQL_COPT_SS_ACCESS_TOKEN = 1256 # Connection option for access tokens, as defined in msodbcsql.h
TOKEN_URL = "https://database.windows.net/" # The token URL for any Azure SQL database
azure_credentials = identity.DefaultAzureCredential()
def provide_token(dialect, conn_rec, cargs, cparams):
# remove the "Trusted_Connection" parameter that SQLAlchemy adds
cargs[0] = cargs[0].replace(";Trusted_Connection=Yes", "")
# create token credential
raw_token = azure_credentials.get_token(TOKEN_URL).token.encode("utf-16-le")
token_struct = struct.pack(f"<I{len(raw_token)}s", len(raw_token), raw_token)
# apply it to keyword arguments
cparams["attrs_before"] = {SQL_COPT_SS_ACCESS_TOKEN: token_struct}
connection_string = "mssql://XXX.database.windows.net/XXX?driver=ODBC+Driver+17+for+SQL+Server"
conn = ibis.connect(connection_string, token_provider=provide_token)
conn.table(name="INPUT_VARIABLES_CONN_PT", schema="IGD").to_pandas()
Indeed that worked!
What is the more commonly-recommended API, ibis.connect(resource)
or ibis.backend.connect(host...)
? It seems like as things stand I should be ready to start working using the ibis.connect
API.
Start with ibis.connect
. We try to make sure that it has parity with ibis.${backend}.connect
, and it doesn't require you to remember specific argument names (user
vs username
, etc).
If there's something you can't do with ibis.connect
that can do with ibis.${backend}.connect
then it's probably a bug 😄
Regarding token_provider=provide_token
, would that be an implementation on Ibis' side?
Regarding
token_provider=provide_token
, would that be an implementation on Ibis' side?
Yep.
It's basically this code at the end of ibis.backends.mssql.Backend.do_connect
:
if token_provider is not None:
sa.event.listens_for(engine, "do_connect")(token_provider)
The main benefit is that you can put your token providing code wherever you want. It doesn't need to be right next to the connection.
Cool! Is that something you'd be okay with adding into Ibis? I assume there'll be more ppl who have the same issue but when I was searching the issues I couldn't find anyone with the same question as me.
Cool! Is that something you'd be okay with adding into Ibis?
Personally, yes, but I'll put up a separate PR for it where we can discuss whether folks think it's a good idea or not.
My hunch is that once people are able to use Azure services with ibis then it'll be an issue :)
One thing to note is that this is likely not going to be released until 8.0 because it's a pretty big change for MS SQL users (unclear how many of those we have).
It seems like many of the would-be MS SQL users aren't currently using ibis because they require odbc support, so for them this is going from zero to one, but for any existing users they'll have to migrate over to PyODBC.
this is likely not going to be released until 8.0 because it's a pretty big change for MS SQL users
Not a problem for us, it'll take a while for us to migrate our pipelines to Ibis so in the meantime I'm happy to just build using either :8.x.x
or your mssql-pyodbc
branch if it's yet to be merged.
Sweet, yeah, assuming this gets merged, it'll be done in the 8.x.x
branch which we'll rebase on top of master
every so often. That'll run through CI like everything else.
One final kink I see is the difference between Ibis' driver
kwarg which defaults to mssql+pyodbc
and then the driver I pass in the query parameter. Do you think it makes sense to somehow merge those in some way, or at least distinguish between these? Because I think pyodbc usually requires you to specify the driver version, the naming might be a little bit confusing.
Perhaps we can remove the query
parameter?
And provide a pyodbc_driver
parameter which gets added to the query inside do_connect
?
Is the current driver
parameter not enough for that? For example, why wouldn't we just stuff driver
into a query parameter dict
inside do_connect
?
I think the driver
you're using atm informs the connection string mssql+pyodbc
which tells SQLAlchemy which dialect to use, whereas the pyodbc driver gets added to the end of the connection string and goes through to pyodbc.
I think the driver you're using atm informs the connection string mssql+pyodbc which tells SQLAlchemy which dialect to use, whereas the pyodbc driver gets added to the end of the connection string and goes through to pyodbc.
Ah, no, but that's how it used to be before this PR!
In this PR, query
and driver
are a bit redundant: driver
is being stuffed into query
.
A point of confusion is that there's the driver
argument to URL.create
and then the driver
query parameter which has nothing to do with the first 😂
Hm, I might need to tinker with this a bit to work out how url-based and non-url-based connection works.
But at the moment, using ibis.connect
, driver is everything before the ://
whereas I am currently providing the ODBC driver using the query
kwarg
https://github.com/ibis-project/ibis/pull/7317#issuecomment-1755038358
Would you keep the query argument in ibis.connect
but then ibis.mssql.connect
the driver arg would get added into the query dict?
But at the moment, using ibis.connect, driver is everything before the :// whereas I am currently providing the ODBC driver using the query kwarg
Those are two different driver
s, unrelated to each other.
The first (mssql://
) is SQLAlchemy's notion of a driver. I'm currently hardcoding that into _build_alchemy_url
as mssql+pyodbc
. This driver is a fixed value.
The second is the PyODBC notion of driver, which is the thing being put into query
. The driver
argument to do_connect
corresponds to the PyODBC notion of driver.
Ok, that makes sense. I'm not sure if it might lead to a bit of confusion down the line, but for a start, keeping the query
in ibis.connect
but using the driver
in do_connect
to inform the PyODBC driver through the query could work.
I think to make query
and driver
less confusing will require some deeper work, across backends, to differentiate query parameters from connect_args
.
This PR moves the MSSQL backend to use ODBC as the underlying driver. This is necessary to support connecting to Azure services. The main user-facing API change is that you now need to set up ODBC to use the MSSQL backend.
Closes #6640. Closes #6012.
xref https://github.com/ibis-project/ibis/discussions/7306