r-dbi / RMySQL

Legacy DBI interface for MySQL
http://cran.r-project.org/package=RMySQL
208 stars 111 forks source link

Execution halted with large queries #40

Open noktilux opened 9 years ago

noktilux commented 9 years ago

Queries that run without any issues in version "0.9-3' now do not work in "0.10.1".

In one of my tests, I removed columns one by one and things worked properly when I got down to 203 columns. I thought perhaps there was a problem with the last one I removed, but that is not the case. If I remove another one instead, the same issue occurs. This is the message:

Error in validObject(.Object) : 
  invalid class “MySQLResult” object: invalid object for slot "Id" in class "MySQLResult": got class "list", should be or extend class "integer"
Calls: dbSendQuery ... .local -> new -> initialize -> initialize -> validObject
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Execution halted

In another situation (involving a JOIN of 2 tables), if I list out all the columns (270 in total), the query works, but if I write it as SELECT table1., table2. FROM ... , then it fails. The message is:

Error in is(object, Cl) : 
  INTEGER() can only be applied to a 'integer', not a 'char'
Calls: fetch -> .valueClassTest -> is -> is -> .Call
Execution halted

To repeat, the queries are all valid and work perfectly in MySQL (issuing direct commands) and in the older version of this library (RMySQL 0.9-3) which I have installed on another machine.

I have placed a test schema for you here: https://heartsys.org/share/foo.tar.gz

Simply run "SELECT * FROM foo.tbl" to reproduce issue #1.

P.S. This is the version of R I am running: R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet" Copyright (C) 2014 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit)

hadley commented 9 years ago

Could you please try with the dev version?

noktilux commented 9 years ago

hadley. can you please provide me with a link to the dev version? i am only seeing for windows binaries (here : http://cran.r-project.org/web/packages/RMySQL/index.html) and i run GNU-linux.

hadley commented 9 years ago

https://github.com/rstats-db/RMySQL ...

noktilux commented 9 years ago

thanks. i will give it a go in the next couple of days.

MJFitzpatrick86 commented 9 years ago

Was a solution to this every figured out? (I am trying to solve a similar problem with the same error message.)

krlmlr commented 9 years ago

@noktilux: Any updates here?

@MJFitzpatrick86: Could you please post a reproducible example?

realgithup commented 9 years ago

I am facing the same problem. Trying to fetch data for thousand companies in a loop. After 30-40 companies it generates the above error. Even dev version has this issue. I get the following error

Error in validObject(.Object) : invalid class "MySQLResult" object: invalid object for slot "Id" in class "MySQLResult": got class "character", should be or extend class "integer"

I had to use version 0.9.3. But that version sometimes generates "caught segfault" error.

hadley commented 9 years ago

@realgithup we can't help with a reproducible example

makoshark commented 8 years ago

This is a real bug although it does seem to be one of those nasty "sometimes" errors. I got this error running an R script and then re-ran the same script on the same machine, absolutely no changes, no restarting MySQL, and it worked just fine.

cturbelin commented 8 years ago

Hi, This is a very strange situation. I have exactly the same error and it's very hard to reproduce. If I run the exact same query outside my script, it runs without any error and I can't get why. So I'm not able to create a test case.

I dont know if this can help : if I add the line "dbColumnInfo(con, table) just before the call to the dbGetQuery call, the query is executed without any error (with table=the table used in dbGetQuery).

SessionInfo(): R version 3.2.1 (2015-06-18) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux Server release 6.3 (Santiago)

locale: [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C [3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8 [5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8 [7] LC_PAPER=fr_FR.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] methods stats graphics grDevices utils datasets base

other attached packages: [1] RColorBrewer_1.1-2 ggplot2_2.0.0 RMySQL_0.10.8 DBI_0.3.1 [5] Cairo_1.5-9

Best.

krlmlr commented 8 years ago

@cturbelin: Thanks. The R code and a dump of the database schema could help narrow down this issue.

vst commented 8 years ago

@cturbelin: Confirmed that your dbColumnInfo hack worked in my case, too.

I have been seeing this error on Mac OSX once in a while when I was running repetitive big queries.

R> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.3 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] devtools_1.10.0

loaded via a namespace (and not attached):
[1] tools_3.2.3   memoise_1.0.0 digest_0.6.9

But, when I moved to a Docker container running Debian 8.3, one of those repetitive big queries never worked. Same database server...

> sessionInfo()
R version 3.2.4 Revised (2016-03-16 r70336)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

When I called the dbColumnInfo once before the mentioned series of queries (I mean only once), it managed to get through without any problems.

PS: The R routine is basically iterating over a stock symbols vector of length ~8000, and issuing an SQL query to retrieve their OHLC series from the database. Simple query, simple DB structure.

vst commented 8 years ago

OK, I have experienced another problem which looks like related to this.

As mentioned in the my previous comment, we have been calling the DBI::dbColumnInfo before attempting to read from the database table. Snippet from the R routine:

[...]

## Blind hack:
print(DBI::dbColumnInfo(xdbi::getConnection(), "table1"))
print(DBI::dbColumnInfo(xdbi::getConnection(), "table2"))

## Read from table1 and table2 separately:
[...]

Yesterday, our system has stopped in the middle of the day with this error message, right after printing the output of the first DBI::dbColumnInfo output:

Error in .local(conn, statement, ...) :
  connection with pending rows, close resultSet before continuing
Calls: print ... .local -> dbSendQuery -> dbSendQuery -> .local -> .Call
Execution halted

(Why does the call trace state dbSendQuery twice, right?)

Now... I know that we are running state machines and such errors must be reproducable. However, same Docker image running on local machine worked fine with the same database dump.

Going back to the server experiencing the problem, I've added a print statement for debugging purposes and it worked everytime when the script is run:

[...]

## Blind hack:
print("What is going on?")
print(DBI::dbColumnInfo(xdbi::getConnection(), "table1"))
print(DBI::dbColumnInfo(xdbi::getConnection(), "table2"))

## Read from table1 and table2 separately:
[...]

(Yep, I am surprised, too.)

I have quickly checked the C code for a possible problem in result set management, but couldn't find any. It looks short, precise and perfectly fine.

Then guess what: I have removed the blind hack entirely (including the print statement), and it never gave me the error message which it used to produce 2 weeks ago.

I am entirely clueless and don't want to waste package contributors' time... Could you address some functions in R and/or C/++ code which I can play with and see if I can produce some meaningful debugging information?

MrDAndersen commented 8 years ago

I am running into a similar issue. When running a sequence of queries using DBI::dbGetQuery, I intermittenly get the Error in validObject(.Object) : invalid class “MySQLResult” object: invalid object for slot "Id" in class "MySQLResult": got class "list", should be or extend class "integer" error. It seems most likely it is the second of the queries that is failing, but not all the time.

eddy85br commented 8 years ago

Impressively, here the same "blind hack" worked.

I'm doing a big query with 5 left joins. It only worked by doing dbColumnInfo(con, "table_name") in all 6 tables used in the large select query, like this:

[ ... ]
    exec = dbExecute(con, paste('use sample_', sample_number, ";", sep=""))
    print(dbColumnInfo(con, "table1"))
    dbColumnInfo(con, "table2")
    dbColumnInfo(con, "table3")
    dbColumnInfo(con, "table4")
    dbColumnInfo(con, "table5")
    dbColumnInfo(con, "table6")
    if (exec == 0) {
        if (sample_number == "00") {
            df_samples = dbGetQuery(con, large_query_with_joins)
            [ ... ]
        }
    }
[ ... ]
cocinerox commented 8 years ago

Similar issue. Same workaround solved it.

renkun-ken commented 7 years ago

Executing via Rscript, I tried dbColumnInfo but this ends up with the following error:

Loading required package: methods
Error in .local(dbObj, ...) : 
  internal error in RS_DBI_getConnection: corrupt connection handle
Calls: dbColumnInfo ... dbClearResult -> dbIsValid -> dbIsValid -> .local -> .Call
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Execution halted

In R terminal, the error is

Error in validObject(.Object) : 
  invalid class “MySQLResult” object: invalid object for slot "Id" in class "MySQLResult": got class "character", should be or extend class "integer"

or

Loading required package: methods
Error in dbFetch(rs, n = -1, ...) : 
  INTEGER() can only be applied to a 'integer', not a 'char'
Calls: dbGetQuery -> dbGetQuery -> dbFetch -> dbFetch -> .Call
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Error in .local(dbObj, ...) : 
  INTEGER() can only be applied to a 'integer', not a 'char'
Calls: dbGetQuery ... dbClearResult -> dbIsValid -> dbIsValid -> .local -> .Call
Execution halted
renkun-ken commented 7 years ago

Through some experiments, I find that when select * or select A, B, C, ... where * or the specified columns represent more than 144 DECIMAL columns, the error will occur.

renkun-ken commented 7 years ago

When I adjust MySQL() to MySQL(fetch.default.rec = 10000) then the error is magically gone.

thornad commented 7 years ago

@renkun-ken The "adjust MySQL() to MySQL(fetch.default.rec = 10000) then the error is magically gone" worked for me too. Thanks for finding a solution for what the RMySQL developers should have fixed long ago.

renkun-ken commented 7 years ago

MySQL(fetch.default.rec = 10000) only works in RStudio, not in R terminal. The same code works fine in RStudio but crashes every time in R terminal. When the script is added to crontab, it crashes every time.