mkleehammer / pyodbc

Python ODBC bridge
https://github.com/mkleehammer/pyodbc/wiki
MIT No Attribution
2.92k stars 561 forks source link

On PyCharm pyodbc, select large column item of delta table to databricks spark shows [Unexpected response from server during a HTTP connection: Bad Status: Unrecognized response with no error message header. Status code: 502. (14) (SQLDriverConnect)]] #1072

Closed this-is-YJ-Lin closed 1 year ago

this-is-YJ-Lin commented 2 years ago

Please first make sure you have looked at:

Environment

To diagnose, we usually need to know the following, including version numbers. On Windows, be sure to specify 32-bit Python or 64-bit:

Issue

This is Azure Databricks support engineer, we have a Customer who faced below issue:

The user are using pyodbc with Pycharm to connect to Azure databricks service to select a delta table, got this issue :

%sql
SELECT tool_id, chamber, itime, **param_value** FROM prod_apc.mf_l8a_b_all_apc_at_continuousdata  
WHERE iday ='2022-06-19'    
AND (itime > '2022-06-19 20:58:57' and itime <= '2022-06-19 21:58:00')   
AND tool_id in ('CFACL101','CFACL201','CFACL301','CFACL401','CFACL501','CFACL601','CFCGL101','CFODF201','CFODF202','CFODF203','CFODF2041','CFODF2042','CFODF2043','CFODF2044','CFODF2045','CFODF2046','CFODF206','CFODF207','CFODF208','CFODF210','CFODF2103','CFODF212','CFODF2122','CFODF2123','CFODF2124','CFODF213','CFODF214','CFODF215','CFODF216','CFODF217','CFODF218','CFODF301','CFODF302','CFODF303','CFODF3041','CFODF3042','CFODF3043','CFODF3044','CFODF3045','CFODF3046','CFODF306','CFODF307','CFODF308','CFODF310','CFODF3103','CFODF312','CFODF3121','CFODF3122','CFODF3123','CFODF3124','CFODF313','CFODF314','CFODF315','CFODF316','CFODF317','CFODF318','CFPFL101','CFPIC101','CFPIC102','CFPIC103','CFPIC104','CFPIC105','CFPIC106','CFPIC107','CFPIC108','CFPIC109','CFPIC110','CFPIC201','CFPIC202','CFPIC203','CFPIC204','CFPIC205','CFPIC206','CFPIC207','CFPIC208','CFPIC209','CFPIC210','')    
AND chamber in ('SOURCE_CURRENT','J_5F','ELECTRICITY','J_3F','')    
AND (param_value like '%CURRENT%' or param_value like '%AVG%')

we got this error messages Error: ('HY000', '[HY000] [Simba][ThriftExtension] (14) Unexpected response from server during a HTTP connection: Bad Status: Unrecognized response with no error message header. Status code: 502. (14) (SQLDriverConnect); [HY000] [Simba][ThriftExtension] (14) Unexpected response from server during a HTTP connection: Bad Status: Unrecognized response with no error message header. Status code: 502. (14)')

1) Only got the error when run this query on Pycharm , but not happen if we run query on Databricks spark notebook. 2) If we delete param_value from select list, query on Pycharm is no problem. This item is a string column on delta table and saves large json file data

I suspect if there is any limitation when running the query on Pycharm? Could you please help check it.?

gordthompson commented 2 years ago

Do you get the same error in PyCharm if you execute this query?

SELECT TOP 1 param_value FROM prod_apc.mf_l8a_b_all_apc_at_continuousdata  
WHERE iday ='2022-06-19'    
AND (itime > '2022-06-19 20:58:57' and itime <= '2022-06-19 21:58:00')   
AND tool_id in ('CFACL101','CFACL201','CFACL301','CFACL401','CFACL501','CFACL601','CFCGL101','CFODF201','CFODF202','CFODF203','CFODF2041','CFODF2042','CFODF2043','CFODF2044','CFODF2045','CFODF2046','CFODF206','CFODF207','CFODF208','CFODF210','CFODF2103','CFODF212','CFODF2122','CFODF2123','CFODF2124','CFODF213','CFODF214','CFODF215','CFODF216','CFODF217','CFODF218','CFODF301','CFODF302','CFODF303','CFODF3041','CFODF3042','CFODF3043','CFODF3044','CFODF3045','CFODF3046','CFODF306','CFODF307','CFODF308','CFODF310','CFODF3103','CFODF312','CFODF3121','CFODF3122','CFODF3123','CFODF3124','CFODF313','CFODF314','CFODF315','CFODF316','CFODF317','CFODF318','CFPFL101','CFPIC101','CFPIC102','CFPIC103','CFPIC104','CFPIC105','CFPIC106','CFPIC107','CFPIC108','CFPIC109','CFPIC110','CFPIC201','CFPIC202','CFPIC203','CFPIC204','CFPIC205','CFPIC206','CFPIC207','CFPIC208','CFPIC209','CFPIC210','')    
AND chamber in ('SOURCE_CURRENT','J_5F','ELECTRICITY','J_3F','')    
AND (param_value like '%CURRENT%' or param_value like '%AVG%')

If so, then please provide an ODBC trace log.

v-chojas commented 2 years ago

That error is coming from the ODBC driver. As Gord says, an ODBC trace will show exactly what's happening, but I suspect the root cause is a bug/limitation in the ODBC driver, not pyODBC.

this-is-YJ-Lin commented 2 years ago

Thanks for your information and quick comment! Our customer is trying now, and I am waiting for the response, I would let you guys know the result as soon as possible. Thanks again!

this-is-YJ-Lin commented 2 years ago

Do you get the same error in PyCharm if you execute this query?

SELECT TOP 1 param_value FROM prod_apc.mf_l8a_b_all_apc_at_continuousdata  
WHERE iday ='2022-06-19'    
AND (itime > '2022-06-19 20:58:57' and itime <= '2022-06-19 21:58:00')   
AND tool_id in ('CFACL101','CFACL201','CFACL301','CFACL401','CFACL501','CFACL601','CFCGL101','CFODF201','CFODF202','CFODF203','CFODF2041','CFODF2042','CFODF2043','CFODF2044','CFODF2045','CFODF2046','CFODF206','CFODF207','CFODF208','CFODF210','CFODF2103','CFODF212','CFODF2122','CFODF2123','CFODF2124','CFODF213','CFODF214','CFODF215','CFODF216','CFODF217','CFODF218','CFODF301','CFODF302','CFODF303','CFODF3041','CFODF3042','CFODF3043','CFODF3044','CFODF3045','CFODF3046','CFODF306','CFODF307','CFODF308','CFODF310','CFODF3103','CFODF312','CFODF3121','CFODF3122','CFODF3123','CFODF3124','CFODF313','CFODF314','CFODF315','CFODF316','CFODF317','CFODF318','CFPFL101','CFPIC101','CFPIC102','CFPIC103','CFPIC104','CFPIC105','CFPIC106','CFPIC107','CFPIC108','CFPIC109','CFPIC110','CFPIC201','CFPIC202','CFPIC203','CFPIC204','CFPIC205','CFPIC206','CFPIC207','CFPIC208','CFPIC209','CFPIC210','')    
AND chamber in ('SOURCE_CURRENT','J_5F','ELECTRICITY','J_3F','')    
AND (param_value like '%CURRENT%' or param_value like '%AVG%')

If so, then please provide an ODBC trace log.


Hi all, Thanks for your suggestion and your time! It's great to tell you that the previous column (mf_l8a_b_all_apc_at_continuousdata) is works now, but another column (apc.mf_l8b_b_all_apc_at_rawdata) faces the similar issue So the query will be:

SELECT top 1 tool_id, chamber, starttime, param_value FROM prod_apc.mf_l8b_b_all_apc_at_rawdata  
WHERE startday = '2022-05-16'
AND starttime >= '2022-05-16 01:00:00' 
AND endtime <= '2022-05-16 13:00:00'
AND tool_id IN ('FKBDEV20','FKBDEV30')   
AND chamber IN ('WASHING_CH')
AND (param_value like '%WASH%')

We got the log of this query from our customer, it found out there are some 502 error responses when If other programs are also using databricks cluster to fetch data at the time of execution:

main            1b60-5138   EXIT  SQLExecDirectW  with return code -1 (SQL_ERROR) 
        HSTMT               0x00000288BC94C0E0 
        WCHAR *             0x00000288BF775070 [     321] "SELECT top 1 tool_id, chamber, starttime, param_value FROM prod_apc.mf_l8b_b_all_apc_at_rawdata  WHERE startday = '2022-05-16'    AND starttime >= '2022-05-16 01:00:00'    AND endtime <= '2022-05-16 13:00:00'    AND tool_id IN ('FKBDEV20','FKBDEV30')      AND chamber IN ('WASHING_CH')    AND (param_value like '%WASH%') "
        SDWORD                   321
        DIAG [08S01] [Simba][Hardy] (115) Connection failed with error: Bad Status: Unrecognized response with no error message header. Status code: 502 (115) 
SQLWCHAR *          0x00000288BF5BB0C0 [     169] "[Simba][ThriftExtension] (14) Unexpected response from server during a HTTP connection: Bad Status: Unrecognized response with no error message header. Status code: 502."

Conversely, if no other program is using the databricks cluster to fetch data at the moment, There will be no 502 error when executing this SQL, but the data cannot be displayed because the returned string is too big

main            611c-4410   EXIT  SQLGetData  with return code 1 (SQL_SUCCESS_WITH_INFO)
        HSTMT               0x000001AC6B860990
        UWORD                        4 
        SWORD                       -8 <SQL_C_WCHAR>
        PTR                 0x000001AC68A9D020 <Invalid string length!> [-4]
        SQLLEN                  4096
        SQLLEN *            0x0000006569F6D140 (-4)
        DIAG [01004] [Simba][ODBC] (10160) Output string data right truncation: string data is too big for the output data buffer and has been truncated. (10160) 

The above is the summary of it, could you help us to mitigate the issue? If you need more information, please feel free to let us know. Awaiting your response, thanks!

Thanks a lot!

this-is-YJ-Lin commented 2 years ago

Hi all, Good day! The error messages were presented above, could you please check it? Thanks again for your help, I am looking forward to your reply!

Baiys1234 commented 2 years ago

@gordthompson @v-chojas I have shared odbc log, could you please confirm it as soon as possible.

Thanks very much

this-is-YJ-Lin commented 2 years ago

Hi all @Baiys1234 @v-chojas @gordthompson , For helping us to clarify the full scenario, I would like to provide the full log of this issue. There are two logs respectively.

  1. We got the log of this query from our customer, it found out there are some 502 error responses when If other programs are also using databricks cluster to fetch data at the time of execution: file name - select_top1_502_odbc
  2. Conversely, if no other program is using the databricks cluster to fetch data at the moment, There will be no 502 error when executing this SQL, but the data cannot be displayed because the returned string is too big: file name - select_top1_too_big_odbc Noted that the log files I have updated to the one drive then set it as private file and assigned you guys only can read it, sorry for Baiys1234, I couldn't find your mail, but it seems like you have log too.

Thanks a lot!