kylemaxxwell / rpostgresql

Automatically exported from code.google.com/p/rpostgresql
0 stars 0 forks source link

Problem handling timestamp columns (without timezone) in R when pulling data using RPostgreSQL #55

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
I'm trying to pull data from a PostgreSQL database and the results for a 
timestamp field are inconsistent. I'm not sure if I'm handling POSIXct results 
properly. Otherwise, I think I found a bug in the RPostgreSQL package. Here are 
the steps to reproduce the problem:

1.  Create testing database and table:

CREATE DATABASE mydb;
CREATE TABLE test_table
(   
  "DateTime" timestamp without time zone NOT NULL,
  CONSTRAINT "pk_test_table" PRIMARY KEY ("DateTime")
)
WITH (
  OIDS=FALSE
);
ALTER TABLE test_table
  OWNER TO postgres;

2. And let’s say there are a few hundred records. I will populate them in R. 
Here is the code:

library(RPostgreSQL)

# Let's feed the table with some sequence of date/time values
date_values <-  as.chron(seq(10000, 10500, 1/24))

format.chron <- function(z)  {
  sprintf("%04.0f-%02.0f-%02.0f %02.0f:%02.0f:00", 
            as.numeric(as.character(years(z))), 
            months(z), 
            as.numeric(as.character(days(z))), 
            as.numeric(as.character(hours(z))), 
            as.numeric(as.character(minutes(z))))
}

.generateInsertQuery <- function(date_values, field_name, table_name) {
  insert_val  <- paste(paste0("(", sQuote(format(date_values)), ")"), collapse=',')
  qry         <- paste("INSERT INTO", dQuote(table_name), paste0("(", dQuote(field_name), ")"), "VALUES", insert_val)
  qry
}

drv <- dbDriver('PostgreSQL')
con <- dbConnect(drv, user='postgres', dbname='mydb')
qry <- .generateInsertQuery(date_values, "DateTime", "test_table")
dbSendQuery(con, qry)

3. Trying to get the values from postgres, the time component gets stripped out 
of the resulting data

res <- dbGetQuery(con, "SELECT * FROM test_table")
res[1:20,1]

The class of the result, however, is POSIXct

class(res[,1])

If the result is fetched one record at a time, the values with hour:min equal 
to 00:00 loose the time component:

rs <- dbSendQuery(con, "SELECT \"DateTime\" FROM test_table")
res_list <- list()
for(i in 1:100) res_list[i]  <- fetch(rs,1)
res_list

===================================================================

The expected output is a data.frame with a POSIXct column with date and time 
components.
What I see instead is a POSIXct column only with date components.  The time 
component is stripped out of the data.  If I want to see at least partially 
correct values, I need to fetch results one at a time.

===================================================================

My system specs are:
"PostgreSQL 9.2.4, compiled by Visual C++ build 1600, 64-bit"
Win 7 Enterprise 64-bit
R 3.0.1 64-bit
RPostgreSQL version 0.4
DBI version 0.2-7

Original issue reported on code.google.com by xjosi...@gmail.com on 22 Jul 2013 at 1:09

GoogleCodeExporter commented 8 years ago
Please don't use sQuote, dQuote in constructing queries. That make incompatible 
quotes, for left side of the quote.

Since POSIXct just hold the seconds since the epoch, nothing is stripped.
compare the following two:
as.POSIXct("1997-05-23 00:00:00")
as.POSIXct("1997-05-23 01:00:00")

You should see that the 00:00:00 are not printed. This does not mean anything
is lost. The difference is just how POSIXct are printed and does not relate to
the behavior of RPostgreSQL.

Original comment by tomoa...@kenroku.kanazawa-u.ac.jp on 23 Jul 2013 at 3:04

GoogleCodeExporter commented 8 years ago
See also issue 44

To recover the string with the trailing 00:00:00,
you can try
format(as.POSIXct("1997-05-23"), "%Y-%m-%d %H:%M:%OS")

Original comment by tomoa...@kenroku.kanazawa-u.ac.jp on 23 Jul 2013 at 8:18

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
Thanks for the reply.  I forgot to include:  options(useFancyQuotes=FALSE)
which makes dQuote/sQuote output compatible with the query.

Dirk replied to my previous post in StackOverflow:
http://stackoverflow.com/questions/17754635/is-there-a-specific-way-to-handle-ti
mestamp-columns-in-r-when-pulling-data-using

Apparently RPostgreSQL does not handle timestamps without timezone correctly.  
However, it does handle timestamps with time zone, which is even better.  I 
even found some gaps in my dataset which were making my time series irregular.

I think not having timezone was a key part of the issue.  When Postgres field 
does not account for timezones, and RPostgreSQL reads the field into R, it 
throws the timestamp field into a POSIXct vector, which apparently always 
expects the values with timezones.  It turns out that some values from the 
database were incompatible with this expectation.  For example, one of the old 
db records had "2008-03-09 02:00:00", which is not a valid time value when 
using daylight savings, as the clocks went 1 hour forward at that time.  
POSIXct simply stripped out the time part of the value and converted it to 
"2008-03-09" instead.  Since the result from this field is a vector in R, and 
all the POSIXct data in the vector needs to have the same format, the format of 
all the data was also turned into YYYY-MM-DD.

Original comment by xjosi...@gmail.com on 23 Jul 2013 at 3:48

GoogleCodeExporter commented 8 years ago
An important point is that timestamp without timezone is not defined properly. 
No one can determine what time it precisely refers to, without external 
information.
So, I don't think there are any correct way.

If daylight savings is the problem, you may simply
try ignoring them.

> Sys.setenv(TZ='UTC')
and then pull the data.

By the way, what time zone combination are you using?
So far, I can not reproduce
"2008-03-09 02:00:00" be treated as invalid and just a day.

Please distinguish string and POSIXct data. POSIXct data is a numeric data and 
do not contain the format you mentioned.

> as.POSIXct(c("2008-03-09 00:00:00", "2008-03-09 02:00:00","2008-03-09 
03:00:00"),tz='CEST')
[1] "2008-03-09 00:00:00 CEST" "2008-03-09 02:00:00 CEST"
[3] "2008-03-09 03:00:00 CEST"
> as.POSIXct(c("2008-03-09 00:00:00", "2008-03-09 02:00:00","2008-03-09 
03:00:00"),tz='CEST')[1]
[1] "2008-03-09 CEST"
> as.POSIXct(c("2008-03-09 00:00:00", "2008-03-09 02:00:00","2008-03-09 
03:00:00"),tz='CEST')[2]
[1] "2008-03-09 02:00:00 CEST"

Original comment by tomoa...@kenroku.kanazawa-u.ac.jp on 24 Jul 2013 at 3:07

GoogleCodeExporter commented 8 years ago
Hmmm... I guess that is an important detail I should have included.  I'm in NY, 
USA, so I'm using the EST/EDT time zone.  I wasn't aware of the Sys.env(TZ) 
option.  Thanks for the tip.  Maybe it would be helpful if POSIXct threw an 
error for invalid times, instead of converting it to absolute dates.  Of 
course, this has nothing to do with RPostgreSQL.

Original comment by xjosi...@gmail.com on 24 Jul 2013 at 6:53

GoogleCodeExporter commented 8 years ago
I have the same issue. The datetime column only returns the date portion and 
not the time portion. :(

Original comment by howard.s...@formsdirect.net on 24 Apr 2015 at 6:35