jackc / pgx

PostgreSQL driver and toolkit for Go
MIT License
10.81k stars 842 forks source link

Scanning of timestamp without time zone forces UTC #924

Closed AlexanderMatveev closed 2 years ago

AlexanderMatveev commented 3 years ago

How to deal with timestamp without time zone type? I expected pgx to scan them to time.Time struct using current PostgreSQL connection client timezone or local golang timezone, but it forces UTC:

package main

import (
    "context"
    "github.com/jackc/pgx/v4"
    "github.com/joho/godotenv"
    "log"
    "os"
    "time"
)

func main() {

    if err := godotenv.Load(".env.local"); err != nil {
        log.Fatal(err)
    }

    conn, err := pgx.Connect(context.Background(), os.Getenv("DATABASE_URL"))
    if err != nil {
        log.Fatal(err)
    }

    var tz string
    var tm time.Time
    err = conn.QueryRow(context.Background(), "select current_setting('TIMEZONE'), now()").Scan(&tz, &tm)
    if err != nil {
        log.Fatal(err)
    }
    log.Printf("current_setting('TIMEZONE'): %s, now(): %v\n", tz, tm)

    var typeOf string
    err = conn.QueryRow(context.Background(), "select pg_typeof(created_at), created_at from review order by created_at desc limit 1").Scan(&typeOf, &tm)
    if err != nil {
        log.Fatal(err)
    }
    log.Printf("pg_typeof(created_at): %v, created_at: %v\n", tm, typeOf
}

2021/02/01 00:10:04 current_setting('TIMEZONE'): Europe/Moscow, now(): 2021-02-01 00:10:04.366656 +0300 MSK 2021/02/01 00:10:04 pg_typeof(created_at): 2021-01-18 23:31:13 +0000 UTC, created_at: timestamp without time zone

So I have to do this ugly timezone update after scanning:

tm, err = time.ParseInLocation(time.ANSIC, tm.Format(time.ANSIC), time.Local)
jackc commented 3 years ago

timestamp without time zone is difficult to handle automatically. The binary format that pgx uses by default does not include the time zone information -- and even if it did there is no guarantee that PostgreSQL and the client system have compatible time zone databases. And because of daylight savings time parsing a string directly into local time risks data corruption. This basically leaves UTC as the only option.

In general I strongly recommend using timestamp with time zone. But I realize that is not always in the client developers control. It is possible to override pgx's type handling at a fairly low level if you really need to but I don't recommend it in this case due to the issues listed above.

AlexanderMatveev commented 3 years ago

Sounds convincing, but then it's not clear why this method works fine:

package main

import (
    "context"
    "github.com/jackc/pgx/v4"
    "github.com/joho/godotenv"
    "log"
    "os"
    "time"
)

func main() {

    if err := godotenv.Load(".env.local"); err != nil {
        log.Fatal(err)
    }

    conn, err := pgx.Connect(context.Background(), os.Getenv("DATABASE_URL"))
    if err != nil {
        log.Fatal(err)
    }

    var tz string
    var tm time.Time
    err = conn.QueryRow(context.Background(), "select current_setting('TIMEZONE'), now()").Scan(&tz, &tm)
    if err != nil {
        log.Fatal(err)
    }
    log.Printf("current_setting('TIMEZONE'): %s, now(): %v\n", tz, tm)

    var typeOf string
    var tmz time.Time
    err = conn.QueryRow(context.Background(), `select pg_typeof(created_at),
            created_at, 
            created_at::timestamptz created_at_tz
            from review
            order by created_at desc
            limit 1`).Scan(&typeOf, &tm, &tmz)
    if err != nil {
        log.Fatal(err)
    }
    log.Printf("pg_typeof(created_at): %v, created_at: %v, created_at_tz: %v\n", typeOf, tm, tmz)
}

2021/02/01 21:16:16 current_setting('TIMEZONE'): Europe/Moscow, now(): 2021-02-01 21:16:16.849667 +0300 MSK 2021/02/01 21:16:16 pg_typeof(created_at): timestamp without time zone, created_at: 2021-01-18 23:31:13 +0000 UTC, created_at_tz: 2021-01-18 23:31:13 +0300 MSK

As you can see, created_at::timestamptz scanned correctly. Please note that I did not make additional requests like setting the client timezone.

jackc commented 3 years ago

The binary format of both timestamp and timestamptz is a 64-bit integer of the number of microseconds since 2000-01-01 00:00:00.

With timestamptz the time zone is always UTC. Because we know the time zone it can be perfectly translated to the local time zone. This happens automatically because time.Unix returns the time in the local time zone.

https://github.com/jackc/pgtype/blob/6830cc09847cfe17ae59177e7f81b67312496108/timestamptz.go#L152

With timestamp it is impossible to know what time zone the value originally was in, so UTC is arbitrarily chosen.

https://github.com/jackc/pgtype/blob/6830cc09847cfe17ae59177e7f81b67312496108/timestamp.go#L145

and for some explanation

https://github.com/jackc/pgtype/blob/6830cc09847cfe17ae59177e7f81b67312496108/timestamp.go#L14

AlexanderMatveev commented 3 years ago

With timestamp it is impossible to know what time zone the value originally was in, so UTC is arbitrarily chosen.

Can't agree with that. When we storing values in timestamp, we don't say "well, it's UTC", but we mean that timezone doesn't matter and it's fully client's choice to interpret it. To proove this I can give another example:

-- start is timestamp without time zone
SELECT current_setting('TIMEZONE'), start, start::timestamptz
FROM release;

Снимок экрана 2021-02-17 в 21 04 50

As you can see, PostgreSQL by itself don't say "it's UTC", but keeps current connection-wide timezone setting. That's by design. And that what is expected while scanning fields. I hope, I explained my point of view in sufficient detail.

jackc commented 3 years ago

I guess some of this depends on whether you view the PostgreSQL type timestamp without time zone as meaning what it says -- without time zone -- or you want it to mean some local time zone. I think it actually means "without time zone" and UTC is the best we can do to ensure daylight savings time doesn't corrupt data.

See https://play.golang.org/p/YLIpROP3MhY for what happens when trying to parse 2021-03-14 02:30:00 in America/Chicago. 2021-03-14 01:30:00 -0600 CST is the result.

Also, this behavior has been how pgx has handled timestamp without time zone for years. Even if it should use local time I think that it would be to risky a breaking change.

ftjfo commented 1 year ago

@jackc If you look at how Postgres' documentation describes comparison between timestampand timestamptz:

From https://www.postgresql.org/docs/15/functions-datetime.html: When comparing a timestamp without time zone to a timestamp with time zone, the former value is assumed to be given in the time zone specified by the TimeZone configuration parameter, and is rotated to UTC for comparison to the latter value (which is already in UTC internally).

To me, this seems to indicate that in absence of a timezone, it is local time as specified in the configuration that should be applied.

select (now()::timestamptz = (now()::timestamp)) is a true statement in Postgres.

While I understand this is a breaking change, there could perhaps be solved with some configuration setting, instead of having to create custom data types to alter the scanning.