zozlak / RODBCext

14 stars 10 forks source link

SQL Server reports column size of 0 for nvarchar #4

Closed r2evans closed 8 years ago

r2evans commented 8 years ago

I can't verify this bug with other databases, but I'm dealing with SQL Server 2014 via R-3.2.5 and RODBC-0.2.5 (haven't updated this one machine since you fixed my last issue, thanks!).

When a schema is created using nvarchar, I get this (truncated/simplified):

> sqlColumns(adsSQLGetConnection(), "SegmentGroups")[,4:7]                                             |
    COLUMN_NAME DATA_TYPE        TYPE_NAME COLUMN_SIZE
 1           Id       -11 uniqueidentifier          36
 2      TrackId       -11 uniqueidentifier          36
 3         Name        -9         nvarchar           0
 4        Owner        -9         nvarchar           0
 5 CalcPriority         4              int          10
 6        Notes        -9         nvarchar           0

In CopyParameters, you have this code:

        if(strlen(cData) > column->ColSize){
          warning(_("character data '%s' truncated to %d bytes in parameter %d"),
            cData, column->ColSize, col + 1);

I believe that's the culprit. The end result is that my use of sqlExecute is coming up with lots of "truncated" warnings, I'm inferring because of the column size of 0.

Is there a different/better way to deal with this? If no easy bug fix is feasible/likely, can you suggest a workaround?

zozlak commented 8 years ago

Unfortunately it is a complicated issue:

Summing up - we know when the problem comes from and in general we know how it can be solved but I do not know when I will find time to do that :(

r2evans commented 8 years ago

So this is only a problem for INSERTs and UPDATEs but not for SELECTs, right?

It's not a show-stopper for me for the moment, though I'd really like to be able to do this in the future. However, now I think I understand from where the problem originates. I'm using the ODBC Driver 11 for SQL Server, so I'll look at the docs to see if it can map column types for writes.

Gotta love "standards", thanks for the fast reply.

r2evans commented 8 years ago

Didn't mean to close it, if you want to keep it open feel free.

zozlak commented 8 years ago

The problem affects all types of queries as buffers passing data between ODBC driver and R must be allocated both when passing data from R to ODBC (INSERT / UPDATE) and from ODBC to R (SELECT).

The only difference is that the for SELECT queries RODBC package functions are doing that (RODBCext is only an extension to RODBC meaning if a given feature, here fetching data, was already implemented in RODBC, then RODBC function is called). Looking into RODBC source, data are processed as follows:

To be honest the problem is not in the ODBC standard but in my (and RODC developer) laziness. The ODBC standards defines the SQL_LONGVARCHAR data type but we were to lazy to correctly implement its support as it requires more work then dealing with other data types and it was not needed for databases we worked with.

r2evans commented 8 years ago

Thanks, that sheds some light on it. I don't belittle the last point: you work on "that which is important to you" (or was when you first wrote it). This is closely tied to "that which affects systems to which I have easy access", such as when some servers' ODBC clients behave differently than others' clients.

I wish I could be of some help, thanks for re-opening it and staying (at least somewhat) interested.

zozlak commented 8 years ago

I will deliver you a patched version of the package for testing (as I am using linux setting up Ms SQL Server instance sounds like quite a lot of trouble to me :) ).

What is your system platform? (Windows 64-bit?)

r2evans commented 8 years ago

I've only been accessing the MS SQL Server from a linux box (ubuntu-14.04.3, R-3.2.5); though I'm sure I can set up a ssh tunnel to get to the server, I haven't tested it yet. But I have some incentive, so I can test that side of things for you as well.

zozlak commented 8 years ago

If you are using linux, it will be even easier:

Try INSERT or UPDATE query on a nvarchar column.

r2evans commented 8 years ago

I am using ODBC Driver 11 for SQL Server, so my current list of ODBC-related packages are:

ii  libodbc1:amd64                       2.2.14p2-5ubuntu5                   amd64        ODBC library for Unix
ii  odbcinst                             2.2.14p2-5ubuntu5                   amd64        Helper program for accessing odbc ini files
ii  odbcinst1debian2:amd64               2.2.14p2-5ubuntu5                   amd64        Support library for accessing odbc ini files
ii  tdsodbc:amd64                        0.91-5                              amd64        ODBC driver for connecting to MS SQL and Sybase SQL servers

I don't recall why, but I don't think unixodbc was working, ergo the switch. It could very well have been an older version, which means I'm kinda stuck until I can upgrade the VM it's working within (which ain't gonna happen for a bit, bummer to say).

zozlak commented 8 years ago

Here we are talking about package needed to compile the RODBCext package (which is something different then packages needed to run ODBC drivers) and as far as I know the ODBC C headers we need are provided by unixodbc-dev. So no matter which packages your ODBC driver depends on you will need unixodbc-dev to compile the RODBCext.

r2evans commented 8 years ago

Gotcha, installed and your 0.2.7 compiled without a problem.

Running a simple query (select count(*) from tablename) is failing, though. On debugging, though, I start in sqlExecute, and the channel looks valid and odbcValidChannel(channel) is true but the query fails:

 debug: stopifnot(odbcValidChannel(channel), is.vector(fetch), is.logical(fetch),
     length(fetch) == 1, all(!is.na(fetch)), is.vector(errors),
     is.logical(errors), length(errors) == 1, all(!is.na(errors)),
     is.vector(rows_at_time), is.numeric(rows_at_time), length(rows_at_time) ==
         1, all(!is.na(rows_at_time)), is.vector(force_loop),
     is.logical(force_loop), length(force_loop) == 1, all(!is.na(force_loop)))
 Browse[2]> odbcValidChannel(channel)
 [1] TRUE
 Browse[2]> channel
 RODBC Connection 3
 Details:
   case=nochange
   DRIVER=ODBC Driver 11 for SQL Server
   SERVER=11.22.33.44
   UID=myuser
   PWD=******
   WSID=myhost
   DATABASE=dbhostname
# ...snip...
 Browse[2]>
 debug: stat <- sqlPrepare(channel, query, errors)
 Browse[2]>
 debugging in: sqlExecute(conn, query, dots, fetch = TRUE, stringsAsFactors = FALSE)
 debug: {
     stopifnot(odbcValidChannel(channel), is.vector(fetch), is.logical(fetch),
         length(fetch) == 1, all(!is.na(fetch)), is.vector(errors),
         is.logical(errors), length(errors) == 1, all(!is.na(errors)),
         is.vector(rows_at_time), is.numeric(rows_at_time), length(rows_at_time) ==
             1, all(!is.na(rows_at_time)), is.vector(force_loop),
         is.logical(force_loop), length(force_loop) == 1, all(!is.na(force_loop)))
     if (!odbcValidChannel(channel)) {
         stop("first argument is not an open RODBC channel")
# ...snip...
 Browse[2]>
 debug: stopifnot(odbcValidChannel(channel), is.vector(fetch), is.logical(fetch),
     length(fetch) == 1, all(!is.na(fetch)), is.vector(errors),
     is.logical(errors), length(errors) == 1, all(!is.na(errors)),
     is.vector(rows_at_time), is.numeric(rows_at_time), length(rows_at_time) ==
         1, all(!is.na(rows_at_time)), is.vector(force_loop),
     is.logical(force_loop), length(force_loop) == 1, all(!is.na(force_loop)))
 Browse[2]> channel
 RODBC Connection 3
 Details:
   case=nochange
   DRIVER=ODBC Driver 11 for SQL Server
   SERVER=11.22.33.44
   UID=myuser
   PWD=******
   WSID=myhost
   DATABASE=dbhostname
 Browse[2]> odbcValidChannel(channel)
 [1] FALSE

The call fails due to this last condition.

After I exit the debugger, the connection is valid again (no changes on my part).

zozlak commented 8 years ago

And what is exactly the error message?

As the only change was made in the code passing data to the server it is strange that query without any parameters passed to the server failed.

r2evans commented 8 years ago
> channel
 using option
 using: Driver=ODBC Driver 11 for SQL Server;Server=172.16.204.11;Database=Staging.Ford.RaceRS;Uid=bevans;Pwd=@ctiveDecisionSupport2015
 Auto-connect to remote database:
 RODBC Connection 1
 Details:
   case=nochange
   DRIVER=ODBC Driver 11 for SQL Server
   SERVER=11.22.33.44
   UID=myuser
   PWD=******
   WSID=myhost
   DATABASE=dbhostname
 > packageVersion("RODBCext") # unixodbc, unixodbc-dev install
 [1] ‘0.2.6’
> sqlExecute(channel, "select count(*) from sessions", fetch = TRUE)

 1 688
 > q("no")

 Process R finished at Fri Jul 15 11:49:33 2016
 Warning: closing unused RODBC handle 1
 ^L

 R version 3.2.5 (2016-04-14) -- "Very, Very Secure Dishes"
### ...snip...
 > channel
 using option
 using: Driver=ODBC Driver 11 for SQL Server;Server=172.16.204.11;Database=Staging.Ford.RaceRS;Uid=bevans;Pwd=@ctiveDecisionSupport2015
 Auto-connect to remote database:
 RODBC Connection 1
 Details:
   case=nochange
   DRIVER=ODBC Driver 11 for SQL Server
   SERVER=11.22.33.44
   UID=myuser
   PWD=******
   WSID=myhost
   DATABASE=dbhostname
 > packageVersion("RODBCext") # unixodbc, unixodbc-dev install
 [1] ‘0.2.7’
> sqlExecute(channel, "select count(*) from sessions", fetch = TRUE)
 Error in sqlPrepare(channel, query, errors) :
   [RODBCext] Error: SQLAllocStmt failed

(In my package, I use a wrapper function that calls sqlExecute, ergo the previous error messages, sorry.)

If you changed nothing that would impact the channel appearing good for sqlPrepare but not afterward, then perhaps it has something to do with the compilation. It did compile cleanly (sorry for the verbosity, provided solely for assurance):

 > devtools::install_github('zozlak/RODBCext')
 Downloading GitHub repo zozlak/RODBCext@master
 from URL https://api.github.com/repos/zozlak/RODBCext/zipball/master
 Installing RODBCext
 '/usr/lib/R/bin/R' --no-site-file --no-environ --no-save --no-restore --quiet  \
   CMD INSTALL '/tmp/RtmpDvKFVV/devtools85c92e370f1/zozlak-RODBCext-ce79d1c'  \
   --library='/home/r2/R/x86_64-pc-linux-gnu-library/3.2' --install-tests

 * installing *source* package ‘RODBCext’ ...
 checking for gcc... gcc -std=gnu99
 checking for C compiler default output file name... a.out
 checking whether the C compiler works... yes
 checking whether we are cross compiling... no
 checking for suffix of executables...
 checking for suffix of object files... o
 checking whether we are using the GNU C compiler... yes
 checking whether gcc -std=gnu99 accepts -g... yes
 checking for gcc -std=gnu99 option to accept ANSI C... none needed
 checking how to run the C preprocessor... gcc -std=gnu99 -E
 checking for egrep... grep -E
 checking for ANSI C header files... yes
 checking for sys/types.h... yes
 checking for sys/stat.h... yes
 checking for stdlib.h... yes
 checking for string.h... yes
 checking for memory.h... yes
 checking for strings.h... yes
 checking for inttypes.h... yes
 checking for stdint.h... yes
 checking for unistd.h... yes
 checking sql.h usability... yes
 checking sql.h presence... yes
 checking for sql.h... yes
 checking sqlext.h usability... yes
 checking sqlext.h presence... yes
 checking for sqlext.h... yes
 checking for library containing SQLTables... -lodbc
 checking for SQLLEN... yes
 checking for SQLULEN... yes
 checking for long... yes
 checking size of long... 8
 configure: creating ./config.status
 config.status: creating src/Makevars
 config.status: creating src/config.h
 ** libs
 gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG -I.     -fpic  -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g  -c RODBC.c -o RODBC.o
 gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG -I.     -fpic  -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g  -c RODBCext.c -o RODBCext.o
 gcc -std=gnu99 -shared -L/usr/lib/R/lib -Wl,-Bsymbolic-functions -Wl,-z,relro -o RODBCext.so RODBC.o RODBCext.o -lodbc -L/usr/lib/R/lib -lR
 installing to /home/r2/R/x86_64-pc-linux-gnu-library/3.2/RODBCext/libs
 ** R
 ** preparing package for lazy loading
 ** help
 *** installing help indices
 ** building package indices
 ** installing vignettes
 ** testing if installed package can be loaded
 * DONE (RODBCext)

Any thoughts on how to troubleshoot this?

zozlak commented 8 years ago

This is strange as this error means that query parsing failed, which is done before parameters binding (and of course also before copying data values) and was not affected by the change in code.

I will try to set up a Ms SQL Server instance on some Windows computer and experiment with it. I hope I will find time for that during the weekend. (as I guess there are no chances I can access the instance of Ms SQL Server you are using, are they?)

r2evans commented 8 years ago

Unfortunately no.

It most likely has something to do this:

## from RODBCext-0.2.6
$ ldd /usr/local/lib/R/site-library/RODBCext/libs/RODBCext.so | grep libodbc
        libodbc.so.2 => /usr/lib/x86_64-linux-gnu/libodbc.so.2 (0x00007efd521c0000)
## from RODBCext-0.2.7
$ ldd R/tmp/RODBCext/libs/RODBCext.so | grep libodbc
        libodbc.so.1 => /usr/lib/x86_64-linux-gnu/libodbc.so.1 (0x00007f1365dd1000)

I'm looking through notes I kept when I built the VM (so glad I kept them!), and I see that I needed a newer version of unixODBC than ubuntu-14.04.3 provides, so I manually installed it. I used unixODBC-2.3.something, whereas ubuntu-14.04 is giving me 2.2.14.

What I find curious now is that I was able to install RODBCext against libodbc.so.2 the first time but it wouldn't work recently (perhaps my install on linux used a binary package?).

zozlak commented 8 years ago

It took some time but I have finally set up a Ms SQL Server instance and was able to test my code against it.

I believe the last commit on GitHub solves the problem for ingesting data into the database.

Unfortunately the problem still exists for fetching data from long varchar columns as this is implemented in the RODBC package (for fetching data RODBCext::sqlExecute() simply calls RODBC::sqlGetResults()) and this part has to be solved by the RODBC package maintainer.

Please try to compile the package, execute some INSERT and/or UPDATE queries and check using some other tool if data were inserted properly. Please do not try SELECT queries as they will simply cause R to explode (see the previous paragraph).

r2evans commented 8 years ago

I'll try to compile it, but I have yet to resolve the libodbc.so.2 problem (due to ubuntu-14.04 and unixODBC-2.2.14). Thanks for the fix!

zozlak commented 8 years ago

You can try even with libodbc.so.1. When I am thinking about the problem with the SELECT query you have reported I am pretty sure it was caused by the wrong implementation of the data fetch in the RODBC package (as for fetching data from the db RODBCext is completely relying on RODBC) and this problem should not have any impact on running INSERT/UPDATE queries.

zozlak commented 8 years ago

With the current commit I get no segfaults for all types of queries.

Fetching Ms SQL Server varchar(max) columns have the same limits as in RODBC (they will be truncated to 255 characters) but at least it seems to work stable in my testing environment.

Passing data to the server should be limited only by the available memory size (I tested it successfully on a string containing 10^7 characters).

r2evans commented 8 years ago

I'm having a problem with the compilation, not due to this patch (I think). It depends on libiodbc, which I cannot install alongside tdsodbc. The only way I got it to compile was to remove tdsodbc, install libiodbc2 (and dev), and build it then (and it built without error). Unfortunately, the iodbc framework isn't recognizing my SQL Server setup, so it does not connect to the database. (I'm beginning to wonder how I installed it the first time.)

zozlak commented 8 years ago

According to test performed by three persons (#6) RODBCext 0.2.7 (on its way to CRAN):