r-dbi / odbc

Connect to ODBC databases (using the DBI interface)
https://odbc.r-dbi.org/
Other
389 stars 107 forks source link

dbReadTable/dbGetQuery returns Invalid Descriptor Index error on SQL server 2016 with NAV tables #112

Closed simsonas closed 6 years ago

simsonas commented 7 years ago

I found similar issue with Invalid Descriptor Index error, and the issue was with column type varchar(max), but I cant find this type of column in my case (see attached picture), using select * statement I cant load any Navision table.

odbc_error

Example:

library(odbc)
library(DBI)

con <- DBI::dbConnect(odbc::odbc(),
                      Driver   = "ODBC Driver 13 for SQL Server",
                      Server   = "xx",
                      Database = "xx",
                      UID      = "xx",
                      PWD      = "xx*",
                      Port     = 1433)

Bin <- dbReadTable(con, 'DM8$Bin Type')
# OR
Bin <- dbGetQuery(con, "SELECT * FROM [DM8$Bin Type]")

Result:

Error in result_fetch(res@ptr, n, ...) : 
  nanodbc.cpp:3102: 07009: [Microsoft][ODBC Driver 13 for SQL Server]Invalid Descriptor Index 

However, selecting certain columns, query returns data.frame. RODBC is working without an issue, I really want switch to ODBC for additional features.

Session Info ```r R version 3.4.0 (2017-04-21) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200) Matrix products: default attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] DBI_0.6-1 odbc_1.0.1 RevoUtilsMath_10.0.0 loaded via a namespace (and not attached): [1] compiler_3.4.0 RevoUtils_10.0.4 tools_3.4.0 tibble_1.3.3 Rcpp_0.12.11 blob_1.1.0 rlang_0.1.1 ```
jimhester commented 7 years ago

I am not sure what is going on, did you try returning only a subset of the columns in that table? Does the order of the columns matter?

simsonas commented 7 years ago

Order of column does matter, I put timestamp field to the end of select statement and it successfully loads table:

Bin <- dbGetQuery(con, "SELECT [Code], [Receive], [Ship], [Put Away], [Pick], [timestamp] FROM [DM8$Bin Type]")

I suspect timestamp field to be the problem here, as this field is present in every table and I cant load any table using select * statement. Timestamp here has nothing to do with actual date, it is more of rowversion and it looks like this: "0x00000000000007D3" more information here

jimhester commented 6 years ago

Unfortunately there doesn't seem to be a way for Applications to fix this, it is really a Driver bug / feature. The only workaround I am aware of is to put these columns at the end of the query.

r2evans commented 6 years ago

I can see this as nothing other than deeply saddening for the state of database access in R. Reliance on a "standard" (ODBC) that hand-cuffs SQL so that the query-er must possess insider knowledge of the table's column constructs so that ordering the fields must be done according to a strict set of rules ...

ODBC does not appear to give any indication that there will be a problem. If you look at dbColumInfo(res) for the query about to be fetched, two (or more) fields can both say "-9", but one must always be after the other. No indication that this condition exists until failure happens, and in the presence of 3 or more columns, no hint is provided as to the true source column of the problem.

@jimhester, is there hope for order-agnostic queries against ODBC data sources? I can make a query against a table in (say) postgres that works, but in sqlserver it fails due solely to the order of columns. This (to my understanding of SQL) defeats the "simple" of SQL as well as the assumption of "standard". I can (and do) translate some things between DBMS's, such as top versus limit, but this is not verb or adjective SQL tokens ...

jimhester commented 6 years ago

Certain drivers fix this, e.g. the RStudio professional drivers and those from Simba handle this situation transparently to the odbc package.

I don't know of a way to do this automatically in the odbc package, but perhaps it could be done either here or in nanodbc.

r2evans commented 6 years ago

@jimhester, I've re-written this comment about seventeen times, and each time I delete it because it feels like I'm just complaining (and you are not the problem). RStudio's fix is both encouraging (that a fix/workaround exists) and frustrating (linux only, and my reliable-data-access is effectively behind a ransom-wall).

Though this really should be fixed in ODBC itself, since MS has done nothing over the last decade the bug has been clearly documented, it needs to be done in upper levels. Nanodbc disagrees that it is theirs to fix.

I'm interested in finding a way to fix this somewhere, inviting nanodbc to use similar techniques to address the problem there. Unfortunately, I'm not familiar enough with nanodbc low-level functions or the ODBC protocol to find enough info to do this.

Specifically, I can find no way to know that this will be a problem. For instance:

library(DBI)
con <- dbConnect(odbc::odbc(), ...)
dbExecute(con, "create table tbl1 ( id BIGINT PRIMARY KEY NOT NULL, s1 varchar(32), s2 varchar(max) )")
dbExecute(con, "insert into tbl1 values (1,'a','a'),(2,'b','b')")
ret <- dbGetQuery(con, "select s1,s2 from tbl1") # works
dbGetQuery(con, "select s2,s1 from tbl1") # fails
# Error in result_fetch(res@ptr, n, ...) : 
#   nanodbc/nanodbc.cpp:2836: 07009: [Microsoft][ODBC Driver 13 for SQL Server]Invalid Descriptor Index 
res <- dbSendStatement(con, "select s2,s1 from tbl1")
ci <- dbColumnInfo(res)
ci
#   name type
# 1   s2   12
# 2   s1   12
dbClearResult(res)

I don't know, given that ci, which is a *(max) variable that needs to be moved to the end. So the first problem is: how to determine if we need to fix a column? Are there additional meta-columns available similar to dbColumnInfo's name/type?

Once that's been identified, what can we do with that information? Is there a way to rearrange the columns post-dbSendStatement to accommodate this constraint? Do we have to go back to the query and rearrange the order of requested columns? (This avenue seems a bit complicated with SQL-parsing, dealing with column as newname renaming, dealing with *, etc.)

I'm willing to do a bit of foot-work with this, but (1) it might need to be a new issue? (2) Can you provide some pointers on how to "play" around the column-metadata, possible reordering, or something else?

r2evans commented 6 years ago

I'm doing some testing, and I thought I'd try modifying odbc_result::result_to_dataframe to go through the for (size_t col = 0; col < types.size(); ++col) twice. Since "getting long data" suggests they need to be the last data retrieved (and retrieved in order), I thought I could simply unroll the loop a little and grab the contents based on a "0" column size (suggesting "max"?):

  while (!complete_) {
  ...
    for (size_t col = 0; col < types.size(); ++col) {
      if (r.column_size(col) == 0) continue;
      ...
    }
    for (size_t col = 0; col < types.size(); ++col) {
      if (r.column_size(col) != 0) continue;
      ...
    }
  }

but it's still failing. With debugging enabled, I've confirmed that on each row, it calls the "not-large" columns first and then all "large" columns in-order, but on the first row it still errs:

nanodbc/nanodbc.cpp:2820 SQLGetData(handle, column + 1, col.ctype_, buffer, buffer_size, &ValueLenOrInd)
<-- rc: SQL_ERROR | 
nanodbc/nanodbc.cpp:334 SQLGetDiagRec(handle_type, handle, (SQLSMALLINT)i, sql_state, &native_error, 0, 0, &total_bytes)
<-- rc: SQL_SUCCESS | 
nanodbc/nanodbc.cpp:352 SQLGetDiagRec(handle_type, handle, (SQLSMALLINT)i, sql_state, &native_error, sql_message.data(), (SQLSMALLINT)sql_message.size(), &total_bytes)
<-- rc: SQL_SUCCESS | 
Error in result_fetch(res@ptr, n, ...) : 
  nanodbc/nanodbc.cpp:2837: 07009: [Microsoft][ODBC Driver 13 for SQL Server]Invalid Descriptor Index 

I suspect that we need to be including SQLFetchScroll in the "blob" (large-column) (based on the MS link above). Thoughts? Experience with this?

r2evans commented 6 years ago

Interesting ...

For the record, I get this error with drivers "ODBC Driver 13 for SQL Server" and "SQL Server Native Client 11.0", but not with "SQL Server".

                            name APILevel ConnectFunctions CPTimeout DriverODBCVer FileUsage SQLLevel UsageCount
1                     SQL Server        2              YYY        60         03.50         0        1          1
20 ODBC Driver 13 for SQL Server        2              YYY        60         03.80         0        1          1
28 SQL Server Native Client 11.0        2              YYY        60         03.80         0        1          1

So the driver that I'm inferring is older (because of the older DriverODBCVer) works just fine, but the newer ones (installed with SSMS and updated recently by Microsoft) are broken.

retorquere commented 4 years ago

Where do I get the older SQL Server driver? I'm trying to convince my SQL-only team mates to join the RMarkdown revolution, but this is not helping.

r2evans commented 4 years ago

@retorquere, you might consider using the FreeTDS driver. I've successfully tested it on Windows and linux. The pros/cons so far:

Pros:

Cons:

Update (Sep 2020): since I wrote this, I've tried FreeTDS on-and-off. Recently, using emacs-27.1, R-4.0.2, in Emacs/ESS on win10, it (somehow) eventually freezes emacs after several minutes, and it is affecting the whole system. If I had to guess, it's a memory leak that bogs everything else to a crawl. When I use Microsoft's ODBC driver, I get no instability.

retorquere commented 4 years ago

@r2evans thanks. The pool package is not much of an issue for us, and since we're using it for fairly simple RMarkdown documents the performance hit is likely tolerable, but most of my colleagues are on Windows, and we're connecting (a.o.) to Azure databases; I can't imagine Azure would allow non-secured connections.

retorquere commented 4 years ago

(but to test: how would I set up a FreeTDS connection? I'm on macOS myself)

r2evans commented 4 years ago

I don't know how to manage odbc drivers on macos, sorry. I was only able to find FreeTDS' appveyor jobs, which is how I found the windows binaries for installation, I wonder if there's an analog for macos. Once you have that, though, you use it just like you would another driver, using the driver="FreeTDS" instead of driver="ODBC Driver 13 for SQL Server" (or whichever version you're using).

Frankly, if pool weren't a problem, I would be completely switched over. Since much of my work suffers hard when a connection hiccups (e.g., RStudio Connect server), I'd rather not implement my own re-connect logic ... so I'm currently stuck with MS's odbc driver. (Which means that I'm already working around the "long data fields" fiasco ...)

r2evans commented 4 years ago

@retorquere, I would be very interested if somebody figures out how to get the freetds.dll registered driver in windows to recognize SSL libraries for the encrypted connection. Yes, I'd imagine that azure databases do (or at last should) require encrypted connections. In a fix, though, do you have the ability to ssh-tunnel in and connect to it as locally as it'll permit? That mitigates some of the encryption concern. (I don't know if azure instances allow ssh-tunneling.)

retorquere commented 4 years ago

I haven't seen anything that would indicate we can do this.

nickv9 commented 4 years ago

Since much of my work suffers hard when a connection hiccups (e.g., RStudio Connect server), I'd rather not implement my own re-connect logic ... so I'm currently stuck with MS's odbc driver. (Which means that I'm already working around the "long data fields" fiasco ...)

@r2evans I thought this problem does NOT occur when using RStudio's commercial drivers?See Jim Hester's comment

Specifically, this statement:

The only driver for SQL Server I know of which does is the ones supplied by Simba which are bundled with the Pro versions of RStudio products.

r2evans commented 4 years ago

First, the comment that suggests FreeTDS is going to have problems with "long data" because it doesn't implement his SQLGetData logic does not pan out: using FreeTDS, I can query the columns in any order, in ways that will cause the MS driver to fail. I don't know that FreeTDS implements both of those functions (that Jim references), but something it is doing is less-wrong than MS's driver. (There are other problems with FreeTDS, though ... namely the pool problem I mentioned above ...)

As to using RStudio pro drivers ... I had been working for way too long before those drivers were available to me (before I purchased RSC), that switching was effectively not the immediate priority. And the fact that those drivers are not for Windows means that I'm out of luck: RStudio pro drivers do not support Windows. Joy.

nickv9 commented 4 years ago

To give you some context behind my questions, I am writing about the problem and solutions in this WIP article: Debugging the Invalid Descriptor Index Error.

In regards to this statement...

RStudio pro drivers do not support Windows. Joy.

The Requirements for RStudio Professional Drivers page does say they support Windows. If this isn't a viable solution for Windows users, I'd like to update my post. I'm curious, have you attempted to use the SQL driver and found it didn't work on Windows?

r2evans commented 4 years ago

I thought it used to say "windows not supported", but effectively the same for me:

The drivers are installed using RStudio Desktop Pro

I have RSC, not RStudio Desktop Pro. I tried downloading the (1.2GB!) tarfile anyway, but no DLLs.