Unable to run parametrized long queries with Impala sql

fstarna commented 1 month ago

Environment

Python: 3.11
pyodbc: 5.1.0
OS: red hat linux
DB: cloudera
driver: odbc

Issue

Hi everyone,

I'm trying to converting my sql queries using unnamed parameters so as to prevent sql injection issues. I'm working on a linux red hat machine with odbc driver, which is connected to an impala sql server on the cloudera system. I'm really struggling in order to make the queries work, because I'm not able to run them, unless in specifi conditions. Here's is a simple code to demonstrate the behavior.

import pyodbc

# connect
crsr = pyodbc.connect('DSN=impala1', autocommit=True).cursor()

# short query
query = """select *
        FROM lab_antifrode.v_padln_t_dett_incassi_dl d 
        where cast(D.cod_fsc as string) <> ''  
        AND d.prg_rig < ?
    AND d.dat_cre_flu >= ?
    AND d.TIP_FLU = 'F24';"""
params = (1000000000, '2023-01-01')

res = crsr.execute(query, params).fetchall() #<-- this works
print('DONE!!!') 

# long query
query = """select *
        FROM lab_antifrode.v_padln_t_dett_incassi_dl d 
        LEFT OUTER JOIN 
        (select cli, fsc,nom_rag_soc, dat_ini_val , dat_ccz_inl,Dat_csz_efv,dta_cessata,sta ,prg_var, COD_CAU_CSZ_DL,rn 
                        from 
                        ( SELECT c2.cod_cli as cli, trim(cast(c2.cod_fsc as string)) as fsc,trim(cast(C2.nom_rag_soc as string)) AS nom_rag_soc, c2.dat_ini_val , dat_ccz_inl,Dat_csz_efv, 
                        (case when c2.Dat_csz_efv= '1900-01-01 00:00:00' then '2999-01-01' ELSE c2.Dat_csz_efv END) AS dta_cessata,sta 
                        ,prg_var,COD_CAU_CSZ_DL 
                        , ROW_NUMBER () OVER (PARTITION BY cod_fsc 
                        ORDER BY (case when c2.Dat_csz_efv='1900-01-01 00:00:00' then '2999-01-01' ELSE c2.Dat_csz_efv END) desc) AS rn 
                        from lab_antifrode.v_padln_t_cliente c2 
                        WHERE  
                        c2.sta='C' 
                        AND c2.dat_ini_val = (select max(c1.dat_ini_val) from lab_antifrode.v_padln_t_cliente c1 where c1.sta='C' and c2.cod_CLI=c1.cod_CLI 
                                        AND c2.cod_fsc=c1.cod_fsc) 
                        AND PRG_VAR = (SELECT max(PRG_VAR) FROM lab_antifrode.v_padln_t_cliente xc WHERE c2.cod_cli = xc.cod_cli AND sta ='C' 
                        AND XC.DAT_INI_VAL =  C2.DAT_INI_VAL) 
                        ) A 
                        WHERE rn=1 
                        ) k 
    ON trim(cast(d.COD_fsc as string))=k.fsc
        WHERE d.prg_rig < ?
    AND d.dat_cre_flu >= ?
    AND d.TIP_FLU = 'F24';"""

params = (1000000000, '2023-01-01')
res = crsr.execute(query, params).fetchall() #<-- this does not work
print('DONE!!!')

If you read carefully you can see that the query are identical in the last part, but the short one works well, while the second one, in which there are a series of subqueries, doesn't. I obviously tried the queries without parameters and both work well.

This is the error I get

pyodbc.Error: ('HY000', '[HY000] [Cloudera][ImpalaODBC] (110) Error while executing a query in Impala: [HY000] : ParseException: Syntax error in line 22:\n        WHERE d.prg_rig < ?\n                          ^\nEncountered: Unexpected character\nExpected: CASE, CAST, DATE, DEFAULT, EXISTS, FALSE, IF, INTERVAL, LEFT, NOT, NULL, REPLACE, RIGHT, TRUNCATE, TRUE, IDENTIFIER\n\nCAUSED BY: Exception: Syntax error\n (110) (SQLPrepare)')

I have tried to search on the internet if someone had similar issues related to subqueries, looked at the documentation of impala and it is written that only certain ddl command (like describe or truncate) are not supported. Do you have any suggestion for this?

Thanks, Francesco

v-chojas commented 1 month ago

What's the error you get? This is likely to be a limitation/bug of your ODBC driver.

fstarna commented 1 month ago

Sorry, I forgot to include the error, I just updated the issue.

gordthompson commented 1 month ago

You could try running your test using pypyodbc or turbodbc instead of pyodbc. If you get the same error then it is not a pyodbc issue per se.

mkleehammer / pyodbc

Unable to run parametrized long queries with Impala sql #1381

Environment

Issue