kylemaxxwell / rpostgresql

Automatically exported from code.google.com/p/rpostgresql
0 stars 0 forks source link

dbGetQuery handles empty recordsets in a problematic manner #51

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Create a table with 2 or more columns:
    dbGetQuery(con, "CREATE TABLE junk (col1 INTEGER, col2 VARCHAR(100))");

2. Use dbGetQuery() to retrieve a data.frame with the contents of this table:
    a <- dbGetQuery(con, "SELECT * FROM junk");

3. I would expect to receive back a data.frame with 0 rows and 2 columns named 
"col1" and "col2".  Instead, I get back:
    print(a)
    data frame with 0 columns and 0 rows

If I try to use this data.frame naively by commands like:
    print(a[1])
I get errors like:
    Error in .subset2(x, i, exact = exact) : subscript out of bounds

4. In contrast, if the table is not empty, I get a data.frame with the 
appropriate number of rows and two columns:
    dbGetQuery(con, "INSERT INTO junk VALUES (1, 'wowee')");
    a <- dbGetQuery(con, "SELECT * FROM junk");
    print(a)

      col1  col2
    1    1 wowee

Now the returned data.frame behaves in the expected manner:
    print(a[1])

      col1
    1    1

What is the expected output? What do you see instead?

I would expect to get back a data.frame with a predictable structure from a 
dbGetQuery() regardless of whether or not the recordset is empty.  With the 
current design, I need to check every return from dbGetQuery() to see if it is 
empty before I attempt to access columns or use the return result in a merge() 
or match() function.  A common code construct for my application looks like:
    Lookup <- dbGetQuery(con, "SELECT id, description FROM lookup_table");
    F <- merge(my_data, Lookup, by.x="id", by.y="id", all.x=TRUE, all.y=FALSE);

but the merge fails if the lookup table happens to be empty.

Note that the R database interface to SQLite3 returns a zero-row data.frame 
with the correct columns when the dbGetQuery() returns no rows.  The problem 
therefore is not intrinsic to DBI, but seems to be specific to RPostgreSQL.

What version of the product are you using? On what operating system?
R 2.15.2, RPostgreSQL 0.4 compiled on R 2.15.3, Windows 7 64-bit

Please provide any additional information below.

Original issue reported on code.google.com by bmusi...@aptecgroup.com on 3 Apr 2013 at 8:52

GoogleCodeExporter commented 8 years ago
Looking through the source code, it seems likely that the system being used to 
detect whether the statement was a SELECT is failing when the result set has 
zero length, and this causes the return of a data.frame with 0 rows and 0 
columns.

Original comment by bmusi...@aptecgroup.com on 24 May 2013 at 3:42

GoogleCodeExporter commented 8 years ago
i glanced quickly through the source and believe the easiest fix would be at 
the R-code level.
in PostgreSQL.R, DBIResult class methods that return result set data frames 
(e.g. "fetch"), have this check:

  out <- postgresqlFetch(res, n, ...)
  if(is.null(out))
    out <- data.frame(out)
  out

the casting of the null "out" to a data frame is where the 0-row-0-col data 
frame is created.
instead of that casting, i'd use a function specifically designed to return a 
"smart" empty data frame, like so:

  emptyResultDataFrame <- function(res)
    {
      klasses <- dbColumnInfo(res)$Sclass
      names(klasses) <- dbColumnInfo(res)$name
      data.frame(sapply(klasses, function(klass) eval(parse(text = sprintf("%s(0)", klass))), simplify = FALSE))
    }

and thus alter the existing code blocks that look like so:

  out <- postgresqlFetch(res, n, ...)
  if(is.null(out))
    out <- data.frame(out)
  out

to:

  out <- postgresqlFetch(res, n, ...)
  if(is.null(out))
    out <- emptyResultDataFrame(res)
  out

NOTE 1: i'm pretty sure there's a better way to initialize an empty vector from 
a class name stored as a string (saving the trouble of my eval/parse trick 
above), so people more familiar with low-level expressions in R can probably 
make an improvement there.

NOTE 2: there's also the chance for a bug here, since i'm *assuming* that the 
Sclass field in the getColumnInfo() table returns valid R classes/types which 
have 'empty' constructors that follow the 'numeric(0)' or 'character(0)' or 
'logical(0)' etc. paradigm.
there's probably also a safer way to initialize an object of zero-length with 
the class name as a string, as mentioned in NOTE 1, and this would also help 
prevent the possible bug introduced by my assumption.

NOTE 2b: i'm aware that "vector(klass, 0)" will work, but only for base 
('mode') types... so i'm not sure if the Sclass field can be populated by 
high-level types in R that might break the "vector()" function, which is why i 
opted for the eval/parse trick instead.

anyhow, i rolled this fix into a forked version of RPostgreSQL on my system and 
it works like a charm to fix the annoying 0-row-0-column bug, so feel free to 
include some version of this in the next release.

cheers,

-murat

Original comment by mmu...@gmail.com on 3 Jul 2013 at 9:09

GoogleCodeExporter commented 8 years ago

Original comment by tomoa...@kenroku.kanazawa-u.ac.jp on 16 Jul 2013 at 12:16

GoogleCodeExporter commented 8 years ago
Any update on this? We're still seeing this. The result of dbGetQuery might or 
might not have column names, depending on if the result is empty. Is there any 
way to get the column names from the empty result without querying for them?

Original comment by jak...@gmail.com on 7 Jul 2014 at 9:06