zozlak / RODBCext

14 stars 10 forks source link

Error when using vectorized form of sqlExecute(). #1

Closed nomoneyx closed 9 years ago

nomoneyx commented 9 years ago

I am trying to use the vectorized form of sqlExecute(). I am using Rstudio version 0.99.441, and the latest version of R 3.2.1 along with SQL server 2008.

#reads in csv file 
write_file <- fread(file_paths, stringsAsFactors = FALSE, integer64 = 'numeric')  

# SQL query
sp_query <- "EXEC [dbo].[usp_StudentMerge]   @ImportID = ?,
                                             @stateId = ?, 
                                             @studentNumber = ?,
                                             @lastName = ?,
                                             @firstName = ?,
                                             @middleName = ?,
                                             @suffix = ?,
                                             @alias = ?,
                                             @gender = ?,
                                             @birthdate = ?,
                                             @raceEthnicity = ?,
                                             @hispanicEthnicity = ?,
                                             @ImportName = ?"

#removes space from query string
sp_query <- gsub("[[:space:]]+", " ", sp_query)

#opens connection to data base
con2 <- odbcConnect("database", uid="user", pwd="password")

#vectorized form
sqlExecute(con2, sp_query, write_file)

#close connection to data base
odbcClose(con2)

I am getting this error.

Error in sqlExecute(con2, sp_query, write_file) : 
    24000 0 [Microsoft][ODBC SQL Server Driver]Invalid cursor state
[RODBCext] Error: SQLExecute failed

If I run sqlExecute() inside a loop it works as expected.

#reads in csv file 
write_file <- fread(file_paths, stringsAsFactors = FALSE, integer64 = 'numeric')  

sp_query <- "EXEC [dbo].[usp_StudentMerge]   @ImportID = ?,
                                             @stateId = ?, 
                                             @studentNumber = ?,
                                             @lastName = ?,
                                             @firstName = ?,
                                             @middleName = ?,
                                             @suffix = ?,
                                             @alias = ?,
                                             @gender = ?,
                                             @birthdate = ?,
                                             @raceEthnicity = ?,
                                             @hispanicEthnicity = ?,
                                             @ImportName = ?"

#removes space from query string
sp_query <- gsub("[[:space:]]+", " ", sp_query)

#opens connection to data base
con2 <- odbcConnect("database", uid="user", pwd="password")

#loops through all rows of the data frame and runs the stored procedure
for(i in 1:nrow(write_file)) {

    sqlExecute(con2, sp_query, write_file[i, ])
} 

odbcClose(con2)

I am not sure what is causing the error.

nomoneyx commented 9 years ago

I used debug to find where the error is coming from.

if (fetch == FALSE | nrow(data) < 1) {
        stat <- .Call("RODBCExecute", attr(channel, "handle_ptr"), 
            data, as.integer(rows_at_time))
        if (stat == -1L) {
            if (errors) {
                stop(paste0(RODBC::odbcGetErrMsg(channel), collapse = "\n"))
            }
            else {
                return(stat)
            }
        }

.Call sets stat = -1 and this produces the error. I tried to use debug(.Call) but this is not working.

zozlak commented 9 years ago

At the beginning a few questions:


And as you tried to do debugging, a short description of internals.

The query execution process is like this:

  1. prepare the query with sqlPrepare() (the result is called a prepared statement)
  2. if(fetch == FALSE | nrow(data) < 1), then run RODBCExecute() with all data at once otherwise run RODBCExecute() for each row of data where RODBCExecute() is a C routine which: a) binds data columns to prepared statement parameters b) executes the prepared statement for each row of data

As you do not provide a "fetch" parameter value for sqlExecute() and its default value is FALSE, then the only difference between sqlExecute(con2, sp_query, write_file) and for(i in 1:nrow(write_file)) { sqlExecute(con2, sp_query, write_file[i, ]) } is the number of loops in https://github.com/zozlak/RODBCext/blob/master/src/RODBCext.c#L313 and we know that running this loop only once does not rise an error (as sqlExecute(con2, sp_query, write_file[i, ]) works as expected).

According to this the most probable reason seems to be that an execution of the Ms SQL Server's EXEC query somehow invalidates a prepared statement and it can not be executed once more.

nomoneyx commented 9 years ago

When I use odbcGetErrMsg(), after I get the error from sqlExecute(), I get the following result.

    Error in inherits(channel, "RODBC") : 
         argument "channel" is missing, with no default

I have not tried a INSERT or UPDATE query.

Below is an example that reproduces the error on my system. Also below is the data, in csv format, I used to recreate the error.

test <- fread('test.csv', stringsAsFactors = FALSE)

sp_query <- "EXEC [dbo].[table]  @ImportID = ?,
                                 @Number = ?, 
                                 @Name = ?, 
                                 @Type = ?, 
                                 @Comments = ?, 
                                 @DualEnrollment =?, 
                                 @CEP = ?, 
                                 @DistrictGuidId = ?,
                                 @ImportName = ?"

sp_query <- gsub("[[:space:]]+", " ", sp_query)

con <- odbcConnect("RDeepWell", uid="bwheeler", pwd="deepwellis#1")

sqlExecute(con, sp_query, test)

odbcGetErrMsg()

odbcClose(con2)

ImportId,number,name,type,comments,dualEnrollment,CEP,DistrictGuidID,ImportName 37,1,character,char,comments text,0,0,A0143469-8714-S741-80E9-7VH7530TG143,ImportName Text 34,2,character,char,comments text,0,0,A0143469-8714-S741-80E9-7VH7530TG143,ImportName Text 1,3,character,char,comments text,0,0,A0143469-8714-S741-80E9-7VH7530TG143,ImportName Text 36,4,character,char,comments text,0,0,A0143469-8714-S741-80E9-7VH7530TG143,ImportName Text

zozlak commented 9 years ago

The problem is that an error is not connected with your data (we know it, because running sqlExecute() in loop works as expected) nor your R code but with the behaviour of:

So I cannot replicate your error without access to a Ms SQL Server database, hopefully with the same stored procedure you run (or another stored procedure which generates the same error). The next problem is that Ms SQL Server runs only on Windows, but my laptop as well as all servers I am administrating run Linux, so I can not reproduce your error by myself.

Because of that I will need your help with debugging either by running some tests for me on your database or by providing me an access to a Ms SQL Server database (might be a completely new, empty database).

What do you prefer?

nomoneyx commented 9 years ago

I can try to run some tests. Where should I start?

zozlak commented 9 years ago

Great! Thank you for your help.

At first I will ask you to try out a vectorized form of an ordinary UPDATE or INSERT query. The exact query does not matter (just make sure you have access rights to the tables you will by trying to modify - the only reason I am reminding you of that is that stored procedures are widely used as a method of privilege separation and this could mean you have no access rights to the tables themselves).

If such queries will run without problems, then reason of failure would be that an EXEC query (in general or on your stored procedure) invalidates a query plan and can not be run once more without planning it again (technically speaking without a separate call to sqlPrepare() ODBC call before each sqlExecute() ODBC call). If this is a case, then EXEC queries can not be vectorized and we can do nothing about that but run sqlExecute() in a loop.

If such queries will not run, the Ms SQL Server ODBC driver behaves differently then MySQL and Postgresql ODBC drivers which I should investigate in detail and update the RODBCext package.

nomoneyx commented 9 years ago

I should be thanking you for creating this useful package.

I tried to run UPDATE and INSERT queries. I tried both queries in the loop form and the vectorized form, but received the following error for each query.

Error in sqlExecute(con2, sp_query, write_file) : 
  22018 0 [Microsoft][ODBC SQL Server Driver]Invalid character value for cast specification
[RODBCext] Error: SQLExecute failed

odbcGetErrMsg()

Error in inherits(channel, "RODBC") : 
  argument "channel" is missing, with no default

I had access rights for the tables in all of these queries.

zozlak commented 9 years ago

It looks strange and rather suggests some error in your UPDATE / INSERT queries.

I am obtaining a virtual machine with Windows, so in a short time I should be able to debug this problem by myself. It could take me some time, as in a few hours time I am going on the useR! conference and will be quite busy there but I should provide you a feedback before the end of week.

zozlak commented 9 years ago

The results of debugging are as follows:

Lets look at the sample code:

library(RODBCext)
P = odbcConnect(PutYourConncetionDataHere)

# create a sample test table
sqlExecute(P, "CREATE TABLE mytest (id int, t varchar(255), d date)")
data = data.frame(
  1:5,
  letters[1:5],
  c(rep('2015-04-01', 4), NA)
)

# insert some data (in a vectorized form)
sqlExecute(P, "INSERT INTO mytest VALUES (?, ?, ?)", data)

# check if data are really there
sqlExecute(P, "SELECT * FROM mytest", NULL, TRUE)

# prepare very simple stored procedure
procCode = "
  CREATE PROCEDURE myprocedure (@id int, @t varchar(255), @d date) AS
  BEGIN
    INSERT INTO mytest VALUES (@id, @t, @d)
    RETURN(0)
  END
"
sqlExecute(P, procCode)

# try to insert data in a vectorized form using stored procedure
sqlExecute(P, "EXEC myprocedure ?, ?, ?", data)

# Check what we have got now
# As you can see a first record of data was inserted (it is duplicated now)
# and the vectorization failed on the second one
# This is an evidence that the problem is that the planned query was invalided 
# after the first execution
# The finding is that EXEC queries can not be "vectorized" by the Ms SQL Server
sqlExecute(P, "SELECT * FROM mytest", NULL, TRUE)

What could be done in the future is to remember the planned query string and force sqlExecute() to check a prepared query state before each execution of the query. If sqlExecute() finds out the a prepared query is invalidated, then it could try to prepare it again using the original query string. Unfortunately this will require quite a lot modifications of the RODBCext package so I can not promise any release date...

zozlak commented 9 years ago

I introduced a simple workaround. It is not very elegant, but should make your code a little cleaner. I have added a force_loop parameter to the sqlExecute() which forces it to plan the query again for each row of the data.

This should work for you (in the RODBCext 0.2.5 which is on its way to CRAN):

sp_query <- "EXEC [dbo].[usp_StudentMerge]   @ImportID = ?,
                                             @stateId = ?, 
                                             @studentNumber = ?,
                                             @lastName = ?,
                                             @firstName = ?,
                                             @middleName = ?,
                                             @suffix = ?,
                                             @alias = ?,
                                             @gender = ?,
                                             @birthdate = ?,
                                             @raceEthnicity = ?,
                                             @hispanicEthnicity = ?,
                                             @ImportName = ?"

sqlExecute(con2, sp_query, write_file, force_loop = TRUE)
nomoneyx commented 9 years ago

Thanks for all your help.