sfu-db / connector-x

Fastest library to load data from DB to DataFrames in Rust and Python
https://sfu-db.github.io/connector-x
MIT License
2.02k stars 163 forks source link

Passing <ServerName\InstanceName>, as the host value, to the MSSQL connection string doesn't work. #546

Closed manuelcmachado closed 1 year ago

manuelcmachado commented 1 year ago

What language are you using?

Python.

What version are you using?

0.3.2

What database are you using?

MSSQL

What dataframe are you using?

Polars

Can you describe your bug?

My MSSQL Server database uses ServerName\InstaceName as host. Passing those values as host to mssql://host:port/db?trusted_connection=true, the Polars read_database() and read_database_uri() methods, throw a "RuntimeError: parse error: invalid domain character." error. If the MSSQL Server is setup with ServerName only it works just fine.

What are the steps to reproduce the behavior?

If possible, please include a minimal simple example including:

Database setup if the error only happens on specific data or data type

Table schema and example data

Example query / code
import polars as pl
import time
import connectorx as cx
import pyarrow

rdb_type = 'mssql'
server_name = '<servername>\<instancename>'
port = 1433 #usually 1433
database_name = 'AdventureWorksDW2022'

uri = f"{rdb_type}://{server_name}:{port}/{database_name}?trusted_connection=true"
query = """
        SELECT ProductKey, DateKey, MovementDate, UnitCost, UnitsIn, UnitsOut, UnitsBalance
        FROM AdventureWorksDW2022.dbo.FactProductInventory;
        """
start_time = time.time()
df = pl.read_database_uri(query, uri)# by default Polars uses connectorx as its connection engine
execution_time = (time.time() - start_time)

print(f'Reading data from the FactProductInventory table in the {database_name} database, in MSSQL Server, takes {execution_time} seconds')

What is the error?

Show the error result here.

RuntimeError Traceback (most recent call last) Cell In[8], line 7 2 query = """ 3 SELECT ProductKey, DateKey, MovementDate, UnitCost, UnitsIn, UnitsOut, UnitsBalance 4 FROM AdventureWorksDW2022.dbo.FactProductInventory; 5 """ 6 start_time = time.time() ----> 7 df = pl.read_database_uri(query, uri)# by default Polars uses connectorx as its connection engine 8 execution_time = (time.time() - start_time) 10 print(f'Reading data from the FactProductInventory table in the {database_name} database, in MSSQL Server, takes {execution_time} seconds')

File ~\AppData\Roaming\Python\Python310\site-packages\polars\io\database.py:450, in read_database_uri(query, uri, partition_on, partition_range, partition_num, protocol, engine, schema_overrides) 447 engine = "connectorx" 449 if engine == "connectorx": --> 450 return _read_sql_connectorx( 451 query, 452 connection_uri=uri, 453 partition_on=partition_on, 454 partition_range=partition_range, 455 partition_num=partition_num, 456 protocol=protocol, 457 schema_overrides=schema_overrides, 458 ) 459 elif engine == "adbc": 460 if not isinstance(query, str):

File ~\AppData\Roaming\Python\Python310\site-packages\polars\io\database.py:486, in _read_sql_connectorx(query, connection_uri, partition_on, partition_range, partition_num, protocol, schema_overrides) 480 except ModuleNotFoundError: 481 raise ModuleNotFoundError( 482 "connectorx is not installed" 483 "\n\nPlease run pip install connectorx>=0.3.2." 484 ) from None --> 486 tbl = cx.read_sql( 487 conn=connection_uri, 488 query=query, 489 return_type="arrow2", 490 partition_on=partition_on, 491 partition_range=partition_range, 492 partition_num=partition_num, 493 protocol=protocol, 494 ) 495 return from_arrow(tbl, schema_overrides=schema_overrides)

File ~\miniconda3\lib\site-packages\connectorx__init__.py:297, in read_sql(conn, query, return_type, protocol, partition_on, partition_range, partition_num, index_col) 294 except ModuleNotFoundError: 295 raise ValueError("You need to install pyarrow first") --> 297 result = _read_sql( 298 conn, 299 "arrow2" if return_type in {"arrow2", "polars", "polars2"} else "arrow", 300 queries=queries, 301 protocol=protocol, 302 partition_query=partition_query, 303 ) 304 df = reconstruct_arrow(result) 305 if return_type in {"polars", "polars2"}:

RuntimeError: parse error: invalid domain character

manuelcmachado commented 1 year ago

Closing this issue. The solution posted here: https://github.com/sfu-db/connector-x/issues/140#issuecomment-1302205592, solved my problem.