sfu-db / connector-x

Fastest library to load data from DB to DataFrames in Rust and Python
https://sfu-db.github.io/connector-x
MIT License
2.02k stars 163 forks source link

CTE queries not working using pandas backend #622

Open med2604 opened 7 months ago

med2604 commented 7 months ago

What language are you using?

Python

What version are you using?

Python 3.11.7

What database are you using?

MSSQL

What dataframe are you using?

pandas

Can you describe your bug?

I have a query that utilises CTE in order to create the resulting table which is what I am looking to retrieve from the sql database. Using a pandas backend, the query does not work even at its simplest form of utilising only one CTE. Using a polars backend, when working with one cte it works but fails when more than one cte is mentioned.

What are the steps to reproduce the behavior?

Create a query with a cte.

Database setup if the error only happens on specific data or data type

Table schema and example data

Example query / code
query = (
with
main_table as(select  [UPRN]
      ,[price_estimate]
      ,[confidence]
      ,[Delta]
      ,[Price]
      ,[DateOfTransfer]
      ,[PropertyType]
      ,[postcode]
      ,left(postcode,len(postcode)-3) as pcd_district
      ,[pc_area]
      ,[fmv_price_estimate]
      ,[fmv_offset_granularity]
      ,[fmv_price_estimate_pct_offset]
  FROM [testing].[dbo].[fmv_test]
where category != 'b'
),
Median_price as(
select pcd_district,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY [Price]) OVER (partition by pcd_district) AS Median_price
FROM main_table
group by pcd_district, price
),
Median_estimate as(
select pcd_district,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY [price_estimate]) OVER (partition by pcd_district) AS Median_price_estimate
FROM main_table
group by pcd_district, [price_estimate]
),
Median_fmv as (
select pcd_district,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY [fmv_price_estimate]) OVER (partition by pcd_district) AS Median_fmv
FROM main_table
group by pcd_district, [fmv_price_estimate]
),
Median_fmv_offset as(
select pcd_district,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY [fmv_price_estimate_pct_offset]) OVER (partition by pcd_district) AS Median_fmv_offset
FROM main_table
group by pcd_district, [fmv_price_estimate_pct_offset]
)

select distinct replace(main.pcd_district,' ','') as pcd_district, Median_price, Median_price_estimate, Median_fmv, Median_fmv_offset from main_table main
join Median_estimate on main.pcd_district = Median_estimate.pcd_district
join Median_fmv on main.pcd_district = Median_fmv.pcd_district
join Median_fmv_offset on main.pcd_district = Median_fmv_offset.pcd_district
join Median_price on main.pcd_district = median_price.pcd_district

Python code is as follows:
conn= f"mssql://{username}:{password}@{server}/{database}?driver=SQL+Server"
results = cx.read_sql(conn, query,)

What is the error?

This is the error that I am getting back: Error connecting to the database or executing the query: Token error: 'Incorrect syntax near the keyword 'with'.' on server X executing on line 2 (code: 156, state: 1, class: 15) An exception has occurred, use %tb to see the full traceback. RuntimeError: Token error: 'Incorrect syntax near the keyword 'with'.' on server X executing on line 2 (code: 156, state: 1, class: 15)

During handling of the above exception, another exception occurred:

SystemExit: 1

[2024-05-01T12:54:48Z ERROR tiberius::tds::stream::token] Incorrect syntax near the keyword 'with'. code=156