sfu-db / connector-x

Fastest library to load data from DB to DataFrames in Rust and Python
https://sfu-db.github.io/connector-x
MIT License
1.85k stars 145 forks source link

Support Azure Active Directory authentication #308

Open PhoenixGray000 opened 2 years ago

PhoenixGray000 commented 2 years ago

What language are you using?

Python 3.9.12

What version are you using?

ConnectorX 0.3.0

What database are you using?

Azure Database (MSSQL)

What dataframe are you using?

Pandas

Can you describe your bug?

I am attempting to query data from an Azure Database using cx.read_sql, but I am facing a "timed out in bb8" runtime error. I have explored the options discussed here. Enabling TCP/IP as suggested in the aforementioned comment thread is not an option due to HIPPA compliance regulations; however, I have confirmed using PyODBC, sqlcmd, and SQLAlchemy that I can indeed establish a remote connection to the server. We use Azure Active Directory as the password authentication method.

Most of our data tables contain nearly a billion rows, so finding a fix for this issue would be a wonderful help! My team would love to be early adopters of this method.

What are the steps to reproduce the behavior?

If possible, please include a minimal simple example including:

Database setup if the error only happens on specific data or data type

We are a HIPPA compliant corporate environment and cannot share specific data information.

Example query / code

This is part of a larger remote connection wrapper I have developed, so please excuse unresolved references I may have missed. It works just fine for PyODBC and SQLAlchemy. This snippet is a class method designed for testing purposes.

def test_connector_x(cnxn):
        import connectorx as cx
        import pandas as pd
        import os
        os.environ['RUST_LOG']="connectorx=debug,connectorx_python=debug"
        query = """
                    select top 100000 *
                    from ztmp.PatientMerge
                    order by MemberID, Period
                """
        datatable = pd.DataFrame()
        datatable = cx.read_sql(cnxn, query)

def main():
        from urllib.parse import quote_plus as qp
        username = username@portal
        password = paSs!word
        server = server.name
        database_name = DatabaseName
        mssql_url = 'mssql://{}:{}'.format(qp(username), qp(password))+\
                            '@'+server+\
                            ':1433/'+database_name+\
                            'trusted_connection=true'
         df = test_connector_x(mssql_url)

What is the error?

BB8 Runtime Error
wangxiaoying commented 2 years ago

Hi @PhoenixGray000 , I'm not sure whether your database enabled encryption. If it is, can you try to add encrypt=true as a parameter in the url and see whether it works? Also trusted_connection=true enables windows authentication, if you want to use username and password you might want to remove them from the url.

PhoenixGray000 commented 2 years ago

Hi @wangxiaoying ,

Good catch! We do have encryption enabled for our database; however, adding encrypt=true to the url results in the following Tiberius and BB8 Runtime errors.

BB8 - Tiberius Runtime Error (Redacted)

The gray box redacts the interpreted server name. Interestingly, the "server" seems to actually be the back half of my username, which is in an email-style "user@name.com" format. (So here, the error reads: Cannot open server "name.com" requested by the login.) Noting that all url-encoded special characters have been popped-out for clarity, printing the mssql access url I am currently using yields the following string:

mssql://{user%40name}:{password%21}@{server}:1433/{database}?encrypt=true&trusted_connection=false

Note that this error occurs regardless of whether trusted_connection=false or is absent in the url. Any insights?

wangxiaoying commented 1 year ago

Hi @PhoenixGray000 , I reread your first post and it seems like you are using active directory as the authentication method, which I think may be the same thing with windows authentication and requires trusted_connection=true. However, we only tested this feature Windows PC and were never really able to test it on Azure, can you try mssql://{user%40name}:{password%21}@{server}:1433/{database}?encrypt=true&trusted_connection=true?

PhoenixGray000 commented 1 year ago

Hi @wangxiaoying , my understanding is that the Azure SQL Database does not permit Windows Authentication. This seems to be corroborated by the new error message I'm getting from the updated connection string you proposed:

image

wangxiaoying commented 1 year ago

Hi @PhoenixGray000 , IC, I thought the "We use Azure Active Directory as the password authentication method." you mentioned is the same with windows authentication. Thank you for helping us testing this scenario.

From your error, it seems like we cannot support Azure Active Directory authentication currently. Is it possible if you can enable username & password authentication? In that case, only encrypt=true is needed and connectorx should be able to support.

PhoenixGray000 commented 1 year ago

Hi @wangxiaoying , unfortunately, AAD authentication is the only accepted configuration for our database at this time. I appreciate your help and suggestions. connectorx looks like a very appealing tool for our purposes, so I look forward to testing it out once AAD becomes a fully supported mode of authentication!

mgsnuno commented 1 year ago

to connect to Azure SQL with an admin account this is what worked (connectorX version 0.3.1):

f"mssql://{urllib.parse.quote_plus(user)}:{urllib.parse.quote_plus(password)}@server.database.windows.net:1433/{db}?encrypt=true"
empowerVictor commented 8 months ago

I'd highly appreciate if we could have some progress on this. I can try to help if the developers give me a hint on what should be implemented on connectorX