Closed galadash closed 1 year ago
Thoughts:
1) There is a serious memory bug in 4.0.38 that is being corrected right now. Please retest with 4.0.39 when it is ready or downgrade to 4.0.35.
2) Have you executed these two SQL statements without pyodbc? This does not look like something pyodbc would affect.
3) If you really do think it is pyodbc, please try executing query2 before query1 to see if the output changes.
I'm betting on (2) right now, though.
You can test the ODBC driver outside of pyodbc using the following VBScript. (Run cscript issue_1197.vbs
from a command prompt.)
' issue_1197.vbs
Set conn = CreateObject("ADODB.Connection")
conn.Open "DSN=Hadoop LDAP"
Sub ExecuteCommand(sql)
If Left(sql, 5) = "WITH " Or Left(sql, 7) = "SELECT " Then
Set rst = CreateObject("ADODB.Recordset")
rst.Open sql, conn
While Not rst.EOF
For i = 0 To rst.Fields.Count - 1
val = rst(i)
If IsNull(val) Then
val = "None"
End If
WScript.StdOut.Write val & vbTab
Next
WScript.Echo
rst.MoveNext
Wend
rst.Close
Set rst = Nothing
Else
Set cmd = CreateObject("ADODB.Command")
cmd.ActiveConnection = conn
cmd.CommandText = sql
cmd.Execute
Set cmd = Nothing
End If
End Sub
ExecuteCommand "CREATE TABLE sandbox.jk_test (col1 int, col2 int)"
ExecuteCommand("INSERT INTO sandbox.jk_test (col1, col2) " & _
"VALUES " & _
"(1,1), " & _
"(2,NULL)")
query1 = "SELECT " & _
"col1, col2 as col2_orig, " & _
"(CASE WHEN col2 IS NOT NULL THEN col2 " & _
" ELSE 0 END) AS col2_1, " & _
"(CASE WHEN col2 IS NULL THEN 0 " & _
" ELSE col2 END) AS col2_2 " & _
"FROM sandbox.jk_test"
query2 = "WITH wrapped_table AS (" & query1 & ") SELECT * FROM wrapped_table"
ExecuteCommand(query1)
WScript.Echo ""
ExecuteCommand(query2)
ExecuteCommand("DROP TABLE sandbox.jk_test")
conn.Close
Thanks @mkleehammer for your thoughts. I didn't consider it could be something related to the driver. I also wouldn't have had any idea of how to try it, so thanks @gordthompson for your proposal, I will give it a try next workday and come back to you!! Hopefully I can clarify things by next week.
So indeed it seems an issue with the driver, and not with pyodbc:
> cscript issue_1197.vbs
Microsoft (R) Windows Script Host Version 5.812
Copyright (C) Microsoft Corporation. All rights reserved.
1 1 1 1
2 None None 0
1 1 1 1
2 None 0 0
I have updated to the latest ODBC driver available on Cloudera (2.06.17.1026), but that did not help. Furthermore, I've downloaded a trial version for CDATA's Impala driver, which manifests the expected behaviour (same results for both queries). I'm therefore pretty confident that this issue is with Cloudera's driver.
If anyone has experience with submitting issues to Cloudera, I'm all ears! I expect I'll have to post it on their community first...
Sorry for opening an issue here, but a million thanks for the help!
Just an update from my side in case anyone ever encounters this issue.
Selecting "Use Native Query" in the Advanced settings of the Cloudera ODCB Driver for Impala results in the expected outcome. I expect I am "loosing" some query optimization they do in the background, but as many queries I use are CTE from, it seems they are not affected anyway.
Just another update in case anyone else bumps into this issue: it has been solved in "Cloudera ODBC Driver for Impala" version 2.7.0.
Environment
Issue
Simply put,
IS NOT NULL
within aCASE WHEN
statement do not generate the expected behaviour. Review this simple MWE:Which results in the following output. I would expect both outputs to be the same, but it clearly is not.
Things I have tried:
It's very strange behaviour, and using such construction in
WHERE
-filters works fine. Also if using the clause to actually do computations, it miraculously works.E.g. executing following snippet:
generates:
In my case I will have to rewrite the SQL-generation process to convert these kind of checks into a
COALESCE()
statement, which would be fine, but it's strange behaviour, which I think should be solved, and might have unintended consequences for other executions? Unfortunately my C++ knowledge is limited, I have not been able to dig deeper in the code and find the solution myself. Hopefully someone can help me! 😄