vapor / postgres-nio

🐘 Non-blocking, event-driven Swift client for PostgreSQL.
https://api.vapor.codes/postgresnio/documentation/postgresnio/
MIT License
304 stars 70 forks source link

DATE columns set values with timezone offsets, producing conflicts when db server is on a different timezone. #462

Open janigro opened 4 months ago

janigro commented 4 months ago

Problem description

First I should say that I read issue #453, although I think this one is slightly different.

When saving a date to a DATE column, the INSERT specifies a timezone offset. This is not usually a problem, if we are in the same timezone as the db server. But working with a server on America/New_York, inserting 2000-01-01 00:00:00+00:00 will actually insert 1999-12-31 (as in New York, it is not 2000-01-01 yet).

This behaviour differs from MySQL, which handles (to my understanding) this case correctly. Inspecting the statement logs in both servers, there is a clear difference in the statement they produce (bindings where substituted by their values for clarity):

MySQL:    INSERT INTO `friends` (`id`, `name`, `birthday`) VALUES (DEFAULT, 'Paul', '2000-01-01 00:00:00')
Postgres: INSERT INTO "friends" ("id", "name", "birthday") VALUES (DEFAULT, 'Paul', '1999-12-31 19:00:00-05') RETURNING "id"

In MySQL, we can ignore the server timezone, and simply use UTC. This guarantees the date stored is the one we will fetch. With Postgres however, we need to be aware of the server timezone at all times, in order to create the Date objects to be inserted, using the server's timezone.

I cannot see the reasoning behind this (I'm sure there's one, I just don't see it).

To Reproduce

A clear example for using the DATE column type, is for storing birthday dates. The time is completely irrelevant and even less is its timezone.

final class Friend: Model, Content {
    static let schema = "friends"

    @ID(custom: "id") public var id: Int?

    @Field(key: "name") public var name: String
    @Field(key: "birthday") public var birthday: Date

    init() { }

    init(id: Int? = nil, name: String, birthday: Date) {
        self.id = id
        self.name = name
        self.birthday = birthday
    }
}

extension Friend {
    struct Migration: Fluent.AsyncMigration {

        func prepare(on database: Database) async throws {

            try await database.schema(Friend.schema)
                .field(.id, .int, .identifier(auto: true))
                .field("name", .string, .required)
                .field("birthday", .date, .required)
                .create()
        }

        func revert(on database: Database) async throws {
            try await database.schema(Friend.schema).delete()
        }
    }
}
let dateFormatter = DateFormatter()
dateFormatter.dateFormat = "yyyy-MM-dd"
dateFormatter.timeZone = TimeZone(secondsFromGMT: 0)

try await Friend(name: "Paul", 
                 birthday: dateFormatter.date(from: "2000-01-01")!).save(on: db)

For this to work correctly on Postgres, we need to know the server's timezone and use it to create the Date object.

dateFormatter.timeZone = .init(identifier: "America/New_York")

Expected behavior

I would think that the result should be the same, no matter which database we are connecting to. I absolutely prefer MySQL's behavior.

If for any reason this is not possible, would it be too crazy/weird to ask that defining the Model property as String (instead of Date) would work with DATE column types too? That way we could do:

try await Friend(name: "Paul", birthday: "2000-01-01").save(on: db)

and that would translate into:

INSERT INTO "friends" ("id", "name", "birthday") VALUES (DEFAULT, 'Paul', '2000-01-01') RETURNING "id"

Another and better solution could be having a different swift type altogether (e.g., DateWithoutTime) and get rid of all ambiguity.

In my case, I can afford to do ALTER DATABASE SET TIMEZONE to 'UTC' and forget about it... but that may not always be possible.

Environment

framework: 4.77.0 toolbox: 18.7.4 postgres-nio: 1.17.0

gwynne commented 4 months ago

I believe this is ultimately a side effect of your use of .date as the field type used for the birthday field in the Fluent migration - the Fluent Postgres driver maps .date to Postgres's DATE type. However, the Date encoding logic in PostgresNIO unconditionally sends Date values to the server encoded as binary-format TIMESTAMP WITH TIME ZONE (aka timestamptz) values. Although the database type says "with time zone", the format is actually a count of seconds since 2000-01-01T00:00:00Z, always in UTC. Postgres then performs the appropriate time zone adjustment when implicitly coercing the timestamptz input to the column's date type. The reverse does not take place when the values are loaded from the database because Postgres will send it as a DATE (reasonably enough), which does not include the time zone offset needed to perform the inverse adjustment (nor is there, for that matter, any reliable indicator that such an adjustment is necessary by that point).

In short, Fluent and PostgresNIO both act, of necessity, on the assumption that Dates are associated with TIMESTAMPTZ columns (in your migration, change .date to .datetime). In fact, it is safe to say as a general statement that Fluent is not guaranteed to handle either DATE or TIME correctly regardless of which database driver one is using (for example, with MySQL, TIME columns just flat-out don't work at all).

[!NOTE] If you really do need a DATE column rather than a TIMESTAMPTZ for your use case (I have found this to be rare in practice, but YMMV of course), you have a few options:

  1. Use SQLKit , and explicitly specify the column in the SQL as "birthday"::"timestamptz", so that Postgres performs the reverse conversion for you. (I dare say this would rather defeat the purpose of the question, however.)
  2. Sidestep the assumptions made by the Date-based logic by using one of the alternate TimestampFormats available with the @Timestamp property wrapper - the ISO8601TimestampFormat would work nicely if you tweaked it to allow you configure it with the database's time zone (getting that info out of the database is left as an exercise for the reader 😅).
  3. The second solution ignores that @Timestamp would be better named @OptionalTimestamp, since it requires the value be Date? rather than just Date. This in turn is easily fixed: write a separate wrapper that doesn't require that, as shown by this sample code. This is working code that can be dropped into your project and used as-is (except that you will have to delete the last line of the file). I've used it without issue in production systems; the only reason I never included it in Fluent is it's not a common enough need to be worth adding even more confusion, especially when its naming is inconsistent 😅.

If none of those works for you, I'm afraid you're out of luck - we can't even look into changing Fluent's existing behavior in this respect unless we can also figure out a way to handle it better in PostgresNIO first (and even then the changes would end up happening in Fluent 5).

janigro commented 4 months ago

Thank you for lightning-fast response. I now better understand where the issue is, and also recognize that changing Fluent 4 to make DATE (i.e., .date) work would also probably mean breaking existing applications that already found their own workarounds.

As to why the importance of DATE (you mention you find this rare in practice)... I agree, but only if we assume we create the schemas ourselves and have the power to choose the field type. Sometimes we need to work with existing databases for which we can only use DML, but no DDL.

In any case, I appreciate your very detailed reply. You have given me many options to work with! Thank you! 😀

gwynne commented 4 months ago

Let's just say that this sort of issue, and the difficulty in addressing it in any helpful fashion even if Fluent 4's API could be more freely changed, is a notable entry on the long list of reasons that the very preliminary design work I've finally started for Fluent 5 centers around a much more SQLKit-like API than what we have now 😅.

Oh, and I forgot - there is one other option you could go with:

(Or, you know, just use TIMESTAMPTZ 😂 😂 😂 But you're absolutely right that that isn't always a viable option.)

gwynne commented 4 months ago

(At the end of the day, it's just another "ORMs are the reason anyone who doesn't work with either audio equipment or electronics has ever even heard of an impedance mismatch" thing, and the last-second choice to shoehorn support for MongoDB into Fluent 4 immediately before the GM release elevated the mismatch to what I'd call almost a flat-out incompatibility after 3 or 4 years of fighting with it 😆. At least I can blame that particular call on @tanner0101, I was against it at the time 🤣.)

janigro commented 4 months ago

I really like this new option. Probably too much for my project, because I can really go with TIMESTAMPTZ or simply change the timezone of the database. However, I still want to try PostgresDecodable, if only as an excercise to better understand the whole process. Thanks again! 😁