r-dbi / RPostgres

A DBI-compliant interface to PostgreSQL
https://rpostgres.r-dbi.org
Other
328 stars 78 forks source link

RPostgres should respect system timezone rather than default to UTC #452

Open veer0318 opened 7 months ago

veer0318 commented 7 months ago

When connecting to a Postgres backend server, RPostgres sets the timezone for the connection to UTC by default (see for example #222). This is odd behaviour, as virtually any other database abstraction system defaults to either not setting the time zone and thus using the Postgres backend server timezone OR defaulting to using the user's system time zone.

The consequence of setting timezone="UTC" by default leads to unexpected behaviour (not wrong behaviour, just unexpected). A better option to me seems to default to the user's system time zone. This should not be hard to implement (although systems might differ in how they store TZ information, e.g. environmental variable TZ or in /etc/timezone.

...unless I'm simply overlooking something and UTC is actually appropriate.

krlmlr commented 7 months ago

Thanks. Does dbConnect(timezone = ) help with your use case?

https://rpostgres.r-dbi.org/reference/postgres

veer0318 commented 7 months ago

Hi! Well, I think that argument is the cause of the problem rather than the solution. Yes, it does solve it if you set in manually to the right timezone. However, one would expect the default to be NULL (i.e. use the Postgres' server timezone) OR default to the user's system timezone (e.g. by reading get.env("TZ") or something like it).

veer0318 commented 7 months ago

Maybe to clarify how this causes issues: for a customer, I built a simple Shiny dashboard which allows the user to create queries to get turnover, stock inventory etc (it's an online retailer with shops in the UK and the Netherlands, HQ in the latter so they expect their results to be in Europe/Amsterdam). The query, very unexpectedly, gave very different numbers of stock and turnover in this Shiny app (using DBI), compared to their Power BI dashboard en pgadmin. It took a very long time to track down to the timezone setting (it was an aggregate query).

AFAIK there's simply no good reason to setting the timezone to UTC for everyone.

iangow commented 6 months ago

I think that if data are stored in the database as timestamptz, then all is good. If not, it may make sense to cast to timestamptz as part of the query using a known time zone. I think the issues with get.env("TZ") is that the database may be in a different time zone from the user and that may be in a different time zone from that of the data (implicitly, if these are timestamp, not timestamptz).

krlmlr commented 6 months ago

I see an advantage of setting times to UTC: consistent results across systems and platforms. Yes, it's unexpected, but times and time zones are a beast anyway. For me, consistency trumps convenience.

@iangow's proposal ensures that results remain as expected even if the server's or user's time zone changes, which is much less under your control than the DBI connection parameters or the queries your code issues.

I appreciate that this is hard, and I'm sorry you spent so much time tracking this down. Is there something we should do better on the documentation front?

cswingle commented 4 months ago

To repeat what I said in #229: I think it is the most logical and least surprising to use the timezone defined in the database when making a connection. That's the way every Postgres client I've used appears to work (psql, pgAdmin, Python psycopg2, PHP, Go lib/pq, etc.). Either that or they're all using the local timezone, which matches the database timezone in every database I've managed. In no case have I used a database client that automatically assumes UTC.

That said, it may be that the current UTC default is long standing enough with RPostgres that it would be more surprising to current users to have that behavior change. I just need to always remember to use the timezone argument when connecting or strange things will happen.

veer0318 commented 4 months ago

I totally agree with @cswingle . The current behaviour might be long-standing but is definitely different from virtually any other client. The best thing IMHO would be to leave the time zone by default unset (why would a client need to do this explicitly?) And thereby leave it to the server to define the TZ. In this case, a client still has the option of overwriting TZ if needed (which is likely not often the case, while it is the case in the current way RPostgres is set up.

cswingley commented 4 months ago

For what it's worth, when querying a Postgres database where the system time zone is not set to UTC, I get equivalent results when retrieving timestamp with time zone data using dbConnect without a timezone argument, with timezone = NULL and with timezone = "US/Alaska" (the database time zone). This is excellent.

Where the default RPostgres behavior of setting the time zone to UTC when connecting causes issues is where internal queries or views depend on being run under the system time zone. In my experience, it's aggregation (timestamp to date) that typically breaks in these queries. Perhaps those queries shouldn't be written in that way, but it is certainly a surprise for users to find that R isn't returning the same answer that they get by running R's show_query() statement on the server using their typical database client (psql, dBeaver, pgAdmin4 in our case).

krlmlr commented 4 months ago

I see that this issue perhaps matters mostly with the aforementioned timestamp-to-date aggregation. Users will be typically more affected by the timezone_out argument, which is timezone by default.

It's also good to hear that timezone = NULL works as expected.

Overall, this seems to be a minor problem, affecting mostly server-side aggregation. Consistency across platforms seems to be provided because timezone = NULL will choose the server timezone, and very few users access more than one database with different settings. Consistency with other database clients (and fewer surprises) might be more important indeed.

If we change the default, we'd need to:

Is it worth it?