Switch to TIMESTAMP WITH TIME ZONE for current_* tables

pnorman commented 11 years ago

Currently the current_nodes, current_ways and current_relations tables include a timestamp column of type TIMESTAMP WITHOUT TIME ZONE.

I've asked around and consulted some sources and the advice I got was best summarized as

19:12 <RhodiumToad> pnorman: unless you're writing a calendaring/scheduling app, timestamp with time zone is pretty much always the way to go

This is not purely a theoretical concern. Currently there is no way to my knowledge to load to an apidb that does not result in time zone errors. See this osmosis discussion

Neither the rails port nor cgimap returns the correct timestamps right now on any instance other than osm.org running on a server where the local time is not set to UTC, because the problem is in the database.

Cross-reference pnorman/openstreetmap-api-testsuite#1 cc: @brettch, @zerebubuth, @iandees

iandees commented 11 years ago

Isn't the answer to fix the rails/cgimap code to use a UTC timestamp when creating data in the database?

pnorman commented 11 years ago

My understanding is if you want everything in UTC like you generally do in OSM, you want WITH TIME ZONE and set the the time zone to UTC. I have to admit I don't exactly understand all the issues with time zones, I'm just going based on advice.

tomhughes commented 11 years ago

As all the data in our database is in UTC it is not clear to me why we need to store a time zone - that should only be needed when you have data in a mix of time zones so that you can correctly compare them and do calculations involving mixed time zones.

You're going to need a much more convincing argument to get a massive change like this through.

vincentdephily commented 11 years ago

First thing first: there is no storage overhead for the 'with timezone' variant, as postgres always stores in UTC. The only (tiny) overhead is the on-the-fly timezone conversion if the client needs it.

The advantage of using 'with timezone' is to fend off client bugs. If the client is smart, he'll know to expect UTC (or whatever is used by convention), but experience shows that clueless clients will happen. At work I've had a few instances of having to force a client timezone (by editing the db user's properties) to work around client bugs. To avoid drama, the database driver or proxy can inform the server what is the client-expected timezone.

That said, if you are dead sure that all osm clients know that they are geting UTC from the db and do the conversion-before-printing work themselves, you can keep using 'without timezone'. But for such a distributed project as osm, it's probably better to play it safe.

pnorman commented 11 years ago

That said, if you are dead sure that all osm clients know that they are geting UTC from the db and do the conversion-before-printing work themselves, you can keep using 'without timezone'. But for such a distributed project as osm, it's probably better to play it safe

Well, we know that some parts of the OSM stack get it wrong because we've already got an osmosis bug about the matter.

OSM data has a time zone associated with it, it just happens to all be the same.

I don't feel this is actually a substantial change as it shouldn't require changes for anything which currently is behaving correctly.

@vincentdephily How'd you hack around client bugs? That might work for me in the short term for getting around the data loading bugs

vincentdephily commented 11 years ago

@pnorman Assuming there is a db role dedicated to each client/program (which is good practice): "ALTER ROLE buggy_app SET timezone to 'Europe/Paris'".

This works in the case of a client which expects a specific timezone but doesn't tell the server, which therefore returns a different timezone instead. Obviously (?), this hack only works if the data is in a 'with timezone' column. The default timezone depends on various server/client/os settings and should not be trusted. The correct way to handle this is to configure the client code (or its framework) to require a specific timezone when connecting to PG.

zerebubuth commented 11 years ago

In my opinion the best option would be to fix the actual bugs. I've looked through the source very briefly and couldn't find anywhere we weren't converting to UTC. The issue may be something along the lines that Postgres (or the client library) is assuming "without timezone" means "in local timezone", in which case there should be a much simpler fix than converting to "with timezone". A simple work-around may be to set $TZ when running Osmosis.

Anecdotally, my experience of writing systems with timestamp information is that it's always easier to keep data internally in UTC at all times and only convert when presenting data to the user. In OSM, I don't think we ever convert out of UTC - all API-generated timestamps should be UTC, and uploaded timestamps are ignored.

In my opinion, timestamp and timezone information are qualitatively very different: the timestamp indicates a particular point in time which is unambiguous to a computer, whereas timezones are only useful as a presentation mechanism to a human. The prevalence of time_t in POSIX APIs would seem to indicate that the designers shared this opinion.

tomhughes commented 11 years ago

@pnorman it is a substantial change if only because it requires an "ALTER COLUMN" to be performed on many of our largest tables.

pnorman commented 11 years ago

The issue may be something along the lines that Postgres (or the client library) is assuming "without timezone" means "in local timezone"

I believe that's what without timezone is defined as in the SQL standard

A simple work-around may be to set $TZ when running Osmosis.

I also need to try inserting by hand to see what results I get

Anecdotally, my experience of writing systems with timestamp information is that it's always easier to keep data internally in UTC at all times and only convert when presenting data to the user. In OSM, I don't think we ever convert out of UTC - all API-generated timestamps should be UTC, and uploaded timestamps are ignored.

Well, the in-database timestamps aren't in UTC because they're not in any timezone.

In my opinion, timestamp and timezone information are qualitatively very different: the timestamp indicates a particular point in time which is unambiguous to a computer, whereas timezones are only useful as a presentation mechanism to a human. The prevalence of time_t in POSIX APIs would seem to indicate that the designers shared this opinion

But in SQL a timestamp without timezone is ambiguous to a computer because it could be in any time zone. When I first read it awhile back the SQL definitions seemed flipped from what I expected in some cases.

@pnorman it is a substantial change if only because it requires an "ALTER COLUMN" to be performed on many of our largest tables

Good point, the time required may be non-trivial with the table rewrite, even if the change itself is simple.

zerebubuth commented 11 years ago

The issue may be something along the lines that Postgres (or the client library) is assuming "without timezone" means "in local timezone"

I believe that's what without timezone is defined as in the SQL standard

Anecdotally, my experience of writing systems with timestamp information is that it's always easier to keep data internally in UTC at all times and only convert when presenting data to the user. In OSM, I don't think we ever convert out of UTC - all API-generated timestamps should be UTC, and uploaded timestamps are ignored.

Well, the in-database timestamps aren't in UTC because they're not in any timezone.

Indeed - the timestamps aren't in a definite timezone as far as the database is concerned (except that it assumes some for conversions from "with timezone") - my meaning was that we define all timestamps to be in UTC, and always internally work in UTC as if we're using time_t.

The Postgres DATETIME docs say "a literal that has been determined to be timestamp without time zone, PostgreSQL will silently ignore any time zone indication." So the next question is whether the client performs any implicit conversion, or passes the information as "with timezone", forcing a conversion.

In my opinion, timestamp and timezone information are qualitatively very different: the timestamp indicates a particular point in time which is unambiguous to a computer, whereas timezones are only useful as a presentation mechanism to a human. The prevalence of time_t in POSIX APIs would seem to indicate that the designers shared this opinion

But in SQL a timestamp without timezone is ambiguous to a computer because it could be in any time zone. When I first read it awhile back the SQL definitions seemed flipped from what I expected in some cases.

Sure, which is why it's easier to define that we always only store UTC timestamps, and we don't (shouldn't) perform any conversions in the database or in the database clients. Timezone information, if useful at all, should only be used in the presentation layer to the user.

vincentdephily commented 11 years ago

The issue may be something along the lines that Postgres (or the client library) is assuming "without timezone" means "in local timezone"

I believe that's what without timezone is defined as in the SQL standard

The server just replies in a "YYYY/MM/DD HH:MM:SS.nnn" format that has no concept of a timezone. It is the client who attaches an implicit timezone (certainly its local timezone) to that value.

A simple work-around may be to set $TZ when running Osmosis.

That's a good quickfix, but means that the whole of osmosis will run with the server's timezone, which might be overkill.

A cleaner option is to use the AT TIME ZONE construct when running an sql query:

    SELECT date AT TIME ZONE 'utc' from current_foobar;

This will tell PG that the 'without timezone' value is stored in UTC. The server can then check the client's timezone setting, and send the value in the client's expected timezone. It works similarly for inserts. This has the advantage that a regional osm website can continue to use its local timezone for user-facing timestamps without getting confused the the server's timezone.

It's not as clean a solution as altering the db schema (because it requires the client to modify its sql queries and it requires the client code to know what timezone the server's timestamps actualy are in), but it avoid the lenghty "alter table".

I also need to try inserting by hand to see what results I get

    tokill=# show time zone;
     Europe/Paris
    tokill=# \d foo
     d_notz | timestamp without time zone | 
     d_tz   | timestamp with time zone    | 
    tokill=# insert into foo values ('2013/01/01 00:00:00', '2013/01/01 00:00:00'),('2013/01/01 00:00:00', '2013/01/01 00:00:00' at time zone 'utc');
    INSERT 0 2
    tokill=# select d_notz, d_notz at time zone 'utc', d_tz from foo;
     2013-01-01 00:00:00 | 2013-01-01 01:00:00+01 | 2013-01-01 00:00:00+01
     2013-01-01 00:00:00 | 2013-01-01 01:00:00+01 | 2012-12-31 23:00:00+01

zerebubuth commented 11 years ago

It's not as clean a solution as altering the db schema (because it requires the client to modify its sql queries and it requires the client code to know what timezone the server's timestamps actualy are in), but it avoid the lenghty "alter table".

The server's timestamps are actually in UTC. Therefore it seems the cleanest solution is fixing the bugs in the client where they are assuming, incorrectly, that they're in local time.

I also need to try inserting by hand to see what results I get

    tokill=# show time zone;
     Europe/Paris
    tokill=# \d foo
     d_notz | timestamp without time zone | 
     d_tz   | timestamp with time zone    | 
    tokill=# insert into foo values ('2013/01/01 00:00:00', '2013/01/01 00:00:00'),('2013/01/01 00:00:00', '2013/01/01 00:00:00' at time zone 'utc');
    INSERT 0 2
    tokill=# select d_notz, d_notz at time zone 'utc', d_tz from foo;
     2013-01-01 00:00:00 | 2013-01-01 01:00:00+01 | 2013-01-01 00:00:00+01
     2013-01-01 00:00:00 | 2013-01-01 01:00:00+01 | 2012-12-31 23:00:00+01

This looks to me like the d_notz column is exactly what we want: Put the data in and take it out without the server messing with it. We just have to stop the client messing with it also.

vincentdephily commented 11 years ago

It's not as clean a solution as altering the db schema (because it requires the client to modify its sql queries and it requires the client code to know what timezone the server's timestamps actualy are in), but it avoid the lenghty "alter table". The server's timestamps are actually in UTC. Therefore it seems the cleanest solution is fixing the bugs in the client where they are assuming, incorrectly, that they're in local time.

The 'with timezone' variant is the idiomatic way to fix timezone issues. It was created for this purpose and is supported all the way down to the wirelevel protocol. You can fix timezone issues another way, but that's IMHO just a hack.

The 'without timezone' variants aren't lighter than the 'with' ones; they exist mainly for sql compliance / backward compatibility and there's no good reason to use them nowadays.

    tokill=# show time zone;
     Europe/Paris
    tokill=# \d foo
     d_notz | timestamp without time zone | 
     d_tz   | timestamp with time zone    | 
    tokill=# insert into foo values ('2013/01/01 00:00:00', '2013/01/01 00:00:00'),('2013/01/01 00:00:00', '2013/01/01 00:00:00' at time zone 'utc');
    INSERT 0 2
    tokill=# select d_notz, d_notz at time zone 'utc', d_tz from foo;
     2013-01-01 00:00:00 | 2013-01-01 01:00:00+01 | 2013-01-01 00:00:00+01
     2013-01-01 00:00:00 | 2013-01-01 01:00:00+01 | 2012-12-31 23:00:00+01

This looks to me like the d_notz column is exactly what we want: Put the data in and take it out without the server messing with it. We just have to stop the client messing with it also.

You're hoping to outsmart the postgres devs, it's rarely a good bet :p You can either fix the timezone issue once server-side following best-practice, or multiple times (once per client) using one of the solutions above.

That being said, if the "alter table" downtime is unacceptable to the sysadmins, then that theoretical discussion is moot, and the practical decision to fix things client-side wins.

zerebubuth commented 11 years ago

The 'with timezone' variant is the idiomatic way to fix timezone issues. It was created for this purpose and is supported all the way down to the wirelevel protocol. You can fix timezone issues another way, but that's IMHO just a hack.

My experience is exactly the opposite: timestamps are time_t and always UTC. "With timezone" is a hack which adds unnecessary complexity, and clients supporting implicit timezone conversion are the reason that these bugs exist in the first place.

You're hoping to outsmart the postgres devs, it's rarely a good bet :p You can either fix the timezone issue once server-side following best-practice, or multiple times (once per client) using one of the solutions above.

That being said, if the "alter table" downtime is unacceptable to the sysadmins, then that theoretical discussion is moot, and the practical decision to fix things client-side wins.

I'm not trying to outsmart anyone - I'm just trying to avoid unnecessary complication and leakage of presentation-layer concepts into the database. But sadly these appear to be mandated by the SQL standard :worried:. We have already fixed the server, but unfortunately the database client libraries are apparently still broken by design.

vincentdephily commented 11 years ago

The 'with timezone' variant is the idiomatic way to fix timezone issues. It was created for this purpose and is supported all the way down to the wirelevel protocol. You can fix timezone issues another way, but that's IMHO just a hack.

My experience is exactly the opposite: timestamps are time_t and always UTC. "With timezone" is a hack which adds unnecessary complexity, and clients supporting implicit timezone conversion are the reason that these bugs exist in the first place. I'm not trying to outsmart anyone - I'm just trying to avoid unnecessary complication and leakage of presentation-layer concepts into the database. But sadly these appear to be mandated by the SQL standard . We have already fixed the server, but unfortunately the database client libraries are apparently still broken by design.

Consider these facts :

regardless of variant, the timstamps are stored as an 8 bytes integer (always representing utc in the "with timezone" case)
they are sent as text ("YYYY/MM/DD....") to the client, with an explicit unambiguous "+HH:MM" timezone marker if known by the server
the server sometimes needs to work in a specific timezone long before any data is sent to the client (SELECT count(*), to_char(d_tz, 'YYYYMM') FROM foo GROUP BY 2)
ultimately somebody will need to convert to a local timezone for display

The complexity is already there wether you want it or not, it's not an overengineering by PG or SQL. It's not unecessary. It's not just a presentation-layer concept. Once again, you can either fix the issue once server-side, or many times client-side (have fun implementing group-by clientside, btw).

Thanks to SQL and 'with timezone', the common case is that the client should be oblivious of the timezone. That's a good thing: it means less code and less bugs. To reuse your own argument, the problem is with client that try to do their own timezone conversion in the first place.

Don't fall into the trap of treating PG as a "dumb datastore" or you'll just waste time reinventing the wheel.

pnorman commented 11 years ago

Indeed - the timestamps aren't in a definite timezone as far as the database is concerned (except that it assumes some for conversions from "with timezone") - my meaning was that we define all timestamps to be in UTC, and always internally work in UTC as if we're using time_t.

via #postgresql, while thinking about pnorman/openstreetmap-api-testsuite#1

22:02 < RhodiumToad> easy then - timestamp with time zone has, in pg, exactly the same semantics as time_t
22:02 < RhodiumToad> (except for more range and resolution)
22:04 < pnorman> the main API software which is the main user of this database schema is actually written in ruby in rails, but I strongly suspect no code changes are needed except to update the schema
22:05 < RhodiumToad> iirc rails these days does have the option to behave sanely with timestamp with time zone
22:05 < RhodiumToad> but I don't use it myself, that's just from helping other people with it

A simple work-around may be to set $TZ when running Osmosis. That's a good quickfix, but means that the whole of osmosis will run with the server's timezone, which might be overkill.

Yes - I think it would screw up progress displays at the least

A cleaner option is to use the AT TIME ZONE construct when running an sql query:
SELECT date AT TIME ZONE 'utc' from current_foobar;
This will tell PG that the 'without timezone' value is stored in UTC. The server can then check the client's timezone setting, and send the value in the client's expected timezone. It works similarly for inserts. This has the advantage that a regional osm website can continue to use its local timezone for user-facing timestamps without getting confused the the server's timezone.

It's not as clean a solution as altering the db schema (because it requires the client to modify its sql queries and it requires the client code to know what timezone the server's timestamps actualy are in), but it avoid the lenghty "alter table".

Well, the output OSM XML is all in UTC in the format 2012-09-30T17:00:00Z

For cgimap how it gets this is it converts to a string in postgresql, see https://github.com/zerebubuth/openstreetmap-cgimap/blob/8bfff6/src/backend/apidb/writeable_pgsql_selection.cpp#L399 which does

to_char(n.timestamp,'YYYY-MM-DD"T"HH24:MI:SS"Z"')

gravitystorm commented 5 years ago

This issue has come up again in #2028 and I feel it's something that we need to resolve. Unfortunately there's a few misunderstandings from @tomhughes and @zerebubuth on this topic, so I hope to persuade them of the case for changing the column type.

http://www.postgresqltutorial.com/postgresql-timestamp/ has some background reading.

The main misunderstanding appears to be that timestamp without timezone means UTC and timestamp with timezone stores a timestamp with an additional timezone offset. This is not the case. Both columns store the same information, i.e. a timestamp. Same bytes. Same timestamp. The difference is only how software should interpret those bytes.

timestamp without timezone - "2018/11/21 10:05" is an event that can happen 38 times, once in each timezone throughout the day. "without timezone" means "treat as local time" and so if you connect to the database with a client in UTC+8, the timestamp refers to "2018-11-21T10:05Z+08:00". This is absolutely not what we want, so everything that connects to our database has to be very careful to interpret the timestamp as UTC - and we end up jumping through hoops as described above.
timestamp with timezone - "2018/11/21 10:05" is an event that happens exactly once, at 10:05 UTC. Let me emphasise again that this column type does not store a timezone offset. It is an indication that the timestamp is, and always is, UTC. This is what we want.

The implications of the change for OSMF are low, since they run all their servers in UTC. But the implications for anyone else running the software, for example a developer on a laptop in another timezone, is that they are likely to get a messed up timestamps in their db. Values that should be UTC (e.g in a dump) are stored as local times instead. Changing the column type to properly reflect that the timestamps are UTC and not local time will remove a whole lot of head-scratching.

So to summarise, we are currently using timestamp without timezone (aka "this timestamp should be interpreted as local time") and then jumping through hoops to make every client connect as UTC. A simple column type change to timestamp with timezone (aka "this timestamp is a UTC timestamp") would be the right thing to do.

pnorman commented 5 years ago

I hope to persuade them of the case for changing the column type.

This matches what I have heard from 2-4 PostgreSQL consultants who I have asked about the matter. In summary, their advice is "Unless you are developing calendar software, never use timestamp without timezone"

pnorman commented 5 years ago

@gravitystorm : please keep in mind that changing to "timestamp with time zone" possibly also has some impact on cgimap, it's not a rails only thing. This issue has hit us multiple times both on rails and cgimap in the past already.

This was raised in https://github.com/openstreetmap/openstreetmap-website/pull/2028#discussion_r235328787

If everything involve is using UTC, there's no cgimap issue. Where I've hit the issue has been on a dev machine where it's not UTC, and in that case, it's got a good chance of being screwed up.

gravitystorm commented 5 years ago

@pnorman exactly. The only reason to use it is for events that you want to happen in local time. So you could store an event "worldwide morning yoga sessions" at "2018-11-21T0900" timestamp without timezone meaning that it happens 38 times, or store "I want to wake up at 8am" as timestamp without timezone so that it happens at 8am in the winter and 8am at the summer. But for anything else, it's the wrong column type.

I really do think this is all a misinterpretation based on what people think the column types mean, based on the names of the types, rather than what they actually mean. I can understand the misinterpretation completely, it's not clearly expressed.

tomhughes commented 5 years ago

Surely "with timezone" has to store the timezone? The actual time might be normalised to UTC but it has to store the original timezone as well so it can return it. That's the whole point right?

Or have I completely misunderstood SQL for years... Time to do some reading I guess.

gravitystorm commented 5 years ago

Surely "with timezone" has to store the timezone? The actual time might be normalised to UTC but it has to store the original timezone as well so it can return it. That's the whole point right?

No, that's not how it works! :-) There's no offset stored, the information about the original timezone is ignored. If you store somethings as ".....10:05Z+0800" into a timestamp with timezone then it just gets stored as "....18:05UTC". Like I say, I think the column type names are a bit misleading.

https://stackoverflow.com/a/9576170/105451 is another good article. It notes "The time zone itself is never stored. It is an input modifier used to compute the according UTC timestamp, which is stored - or and output modifier used to compute the local time to display - with appended time zone offset."

tomhughes commented 5 years ago

So it seems you're right which is very, err, counter-intuitive. My understanding had always been with a with timezone column stored an indication of the original timezone so that you could store values from different timezones and then recover the original zone.

Apparently that is not true for Postgres at least, and it always returns values in the servers default timezone.

My only remaining concern is that timestamp constants are always treated as without timezone unless explicitly cast which might cause an issue if somebody had a non-UTC default timezone?

tomhughes commented 5 years ago

I think this is also a point where different databases are not consistent - my reading is that Oracle for example does do what I expected but also has an extra with local time zone version that behaves more like Postgres does (except it normalises to the servers's time zone not necessarily UTC).

gravitystorm commented 5 years ago

My only remaining concern is that timestamp constants are always treated as without timezone unless explicitly cast which might cause an issue if somebody had a non-UTC default timezone?

Do we use timestamp constants anywhere? I'm guessing not in our rails code, but perhaps in other apps? If so, they could be re-written to be a timestampz e.g. select ('2017-01-01 00:00:00' at time zone 'utc')::timestamptz

tomhughes commented 5 years ago

I have no idea whether we do (or whether rails might behind the scenes) it's just something to keep in mind.

tomhughes commented 5 years ago

This is all from https://www.postgresql.org/docs/10/datatype-datetime.html#id-1.5.7.13.19.7 which suggests TIMESTAMP WITH TIME ZONE '2004-10-19 10:23:54' as syntax to avoid issues.

I think it only matters if there is a TZ indication in the constant though so probably unlikely to ever be an issue.

gravitystorm commented 5 years ago

I have no idea whether we do (or whether rails might behind the scenes) it's just something to keep in mind.

Noted. My remaining query is on the time taken to make the column type changes in production, i.e. will we need any downtime to run the migration, or given it should (hopefully) be just a column metadata change, is it quick? Is there an easy way to find this out before we do it for real?

tomhughes commented 5 years ago

That's a good question, especially in 9.5, so we need to find a way to check it.

We'd likely need a brief outage anyway to avoid lock contention issues on some of the tables at least but we'd still need to know it would be quick excluding that.

mmd-osm commented 5 years ago

Where I've hit the issue has been on a dev machine where it's not UTC, and in that case, it's got a good chance of being screwed up.

Right, cgimap indeed depends on UTC-0, and there will be some screw up otherwise: https://github.com/zerebubuth/openstreetmap-cgimap/issues/150.

mmd-osm commented 5 years ago

Quick question on this one. Cgimap has a few queries like the example below. Assuming we're running the server in a non-UTC-0 time zone, would we need to apply to changes to the code then once the data type has an additional with local time zone? My assumption would be yes here, so we'd probably need a Github issue on cgimap to cover this change.

  m_connection.prepare("extract_nodes",
    "SELECT n.id, n.latitude, n.longitude, n.visible, "
        "to_char(n.timestamp,'YYYY-MM-DD\"T\"HH24:MI:SS\"Z\"') AS timestamp, "
        "n.changeset_id, n.version, array_agg(t.k) as tag_k, array_agg(t.v) as tag_v "
      "FROM current_nodes n "
        "LEFT JOIN current_node_tags t ON n.id=t.node_id "
      "WHERE n.id = ANY($1) "
"GROUP BY n.id ORDER BY n.id")

tomhughes commented 5 years ago

I think that entirely depends what timezone you want cgimap to return those timestamps in... If you want them in UTC then you're good, if you want them in local time then no.

mmd-osm commented 5 years ago

The "Z" (zulu) character in the timestamp string implies that all timestamps should be returned in UTC-0, no matter what time zone the server runs in.

I also remember seeing some issues when populating the database via osmosis from a UTC+3 terminal (see this year's GSoC project: https://github.com/osmlab/yalcha/issues/11). This ended up creating wrong timestamps all over the place. Not sure, if this behaviour would be any different with osmosis and timestamp with timezone.

gravitystorm commented 5 years ago

If you want them in UTC then you're good,

Yes, that code looks to me like it needs no changes. It's already forcing the timestamp to be considered as UTC by ending in Z with no offset shown. That will work currently on databases configured as UTC, and after the column change, on all servers regardless of their timezone.

(In fact, changing the will fix a potential source of bugs that developers might encounter. If today at noon in California you insert a node into your local development database using raw sql (INSERT INTO current_nodes VALUES .... ( ... NOW()... ) for some development work stuff, then the timestamp would contain "2018-11-21T12:00" (i.e. local time) and cgimap would show that as 2018-11-21T12:00Z" (UTC) which is a different time entirely. But when we change the column type, the NOW() would instead store the UTC timestamp in the timestamp field, and the potential bug disappears, with no changes needed to cgimap.)

gravitystorm commented 5 years ago

The "Z" (zulu) character in the timestamp string implies that all timestamps should be returned in UTC-0, no matter what time zone the server runs in.

That's not quite what it's doing. It's taking the timestamp from the database, formatting it appropriately and then sticking an actual letter Z at the end of the string (like the T, it's double quoted - it's not a formatting character like YYYY so postgres is just adding that character to the string). So there's no timezone conversion happening in the to_char method. It's taking the timestamp without timezone and making it look like a UTC timestamp. Which of course works fine if the server is in UTC but leads to other problems in other situations.

This ended up creating wrong timestamps all over the place.

Yes, this is the same problem. Osmosis will have read the UTC timestamp from the XML file, handled it internally as UTC all through the pipelines, and then when writing it to the database the pg connection library will have converted that time to local time when storing it as timestamp without timezone. Then cgimap / rails port / whatever will have used that timestamp, assumed it was UTC and there's the problem.

mmd-osm commented 5 years ago

That's not quite what it's doing. It's taking the timestamp from the database, formatting it appropriately and then sticking an actual letter Z at the end of the string

Right, that's what I had in mind, but somehow failed to make it clear. it's a mere string indicator to the data consumer, that the timestamp should be considered to be UTC. To postgres, it's just some opaque string which happens to be returned as part of the response.

gravitystorm commented 5 years ago

I found this in the postgresql 9.2 release notes, which sounds hopeful that the table changes might be done without extended downtimes:

Reduce need to rebuild tables and indexes for certain ALTER TABLE ... ALTER COLUMN TYPE operations (Noah Misch)

Increasing the length limit for a varchar or varbit column, or removing the limit altogether, no longer requires a table rewrite. Similarly, increasing the allowable precision of a numeric column, or changing a column from constrained numeric to unconstrained numeric, no longer requires a table rewrite. Table rewrites are also avoided in similar cases involving the interval, timestamp, and timestamptz types.

I think we'll still need a short downtime to get the locks, but hopefully will be fast apart from that.

Is there a replica that this could be tested on? Or should we just start with the smaller tables and work our way up? :smiley:

mmd-osm commented 5 years ago

A few words on the status quo: by convention, the without timezone value always contained the UTC+0 value, even if your system is running on some other timezone, which means that Rails, cgimap and osmosis already work as is, regardless of the server timezone. Rails code took care of this all the time by putting in the global timestamp in that field. Loading a database via osmosis obviously didn't work in a non-utc set up, and there's a small glitch in cgimap with open changesets, but those are about the only issues I'm currently aware of.

column type conversion

I did some experiments using the following statement, which aims at preserving the original UTC based timezone value.

alter table current_nodes alter column timestamp type timestamptz using timestamp at time zone 'UTC';

NB: leaving out at timezone 'UTC' breaks the Rails port on a UTC+1 box, i.e. it's showing incorrect timestamps after the conversion.

This statement took around 1-2 seconds on a 300k table. It's not entirely clear to me what the performance impact will be on a UTC+0 system when processing a few billion rows.

With this change in place, the timestamp is now shown as follows:

     id     |  latitude  | longitude  | changeset_id | visible |           timestamp           |    tile    | version 
------------+------------+------------+--------------+---------+-------------------------------+------------+---------
 5002730175 |  423712670 | -834339064 |         1061 | t       | 2018-06-08 22:38:42.691478+02 | 1701877902 |       1

Rails port

api/0.6/node handled by Rails port returns: node id="5002730175" changeset="1061" timestamp="2018-06-08T20:38:42Z

-> Rails port looks ok!

cgimap

Contrary to some earlier statements, cgimap shows incorrect timestamps now, in case the server isn't running on UTC+0.

http://localhost:31337/api/0.6/node/5002730175

bildschirmfoto von 2018-11-21 23-05-58

node id="5002730175" visible="true" version="1" changeset="1061" timestamp="2018-06-08T22:38:42Z"

I think we would have to add at time zone 'UTC' to all statements for a correct timestamp then. This change however needs to be exactly in sync with changing the column type to "timestamp with time zone" to avoid incorrect timestamps on a non-UTC system. Pretty confusing after all.

select to_char(timestamp at time zone 'UTC','YYYY-MM-DD"T"HH24:MI:SS"Z"') AS timestamp from current_nodes

osmosis

~I haven't really tested it, but I'm suspecting osmosis diff replication to also break on a non-UTC box, exactly for the same reason that it assumes timestamp to be in UTC+0.~

Confirmed. Osmosis also breaks by this change and produces osmChange files with incorrect timestamps:

bildschirmfoto von 2018-11-23 18-05-57

https://github.com/openstreetmap/osmosis/blob/master/osmosis-apidb/src/main/java/org/openstreetmap/osmosis/apidb/v0_6/impl/EntityDao.java#L120

pnorman commented 5 years ago

Noted. My remaining query is on the time taken to make the column type changes in production, i.e. will we need any downtime to run the migration, or given it should (hopefully) be just a column metadata change, is it quick? Is there an easy way to find this out before we do it for real?

The values need to be converted from one type to another. It's a trivial conversion, but it still needs to be done. Because the nodes table isn't partitioned that means the parallelism is limited.

It's probably possible to do this without an outage but not easy.

mmd-osm commented 5 years ago

@pnorman, @gravitystorm : what would be your take on how to address the breakage of cgimap, osmosis, and other tools due to this change on a non-UTC system?

tomhughes commented 5 years ago

@pnorman really, even if the system is running on UTC so that nothing changes?

If the columns need to be rewritten then I think this will pretty much be a non-starter as we would need a downtime measured in hours if not days.

pnorman commented 5 years ago

@pnorman, @gravitystorm : what would be your take on how to address the breakage of cgimap, osmosis, and other tools due to this change on a non-UTC system?

My take is that you're already broken if you're running those on a non-UTC system.

The best transition plan is probably to force the timezone for the users those clients connect from, or to set appropriate libpq env vars to force the timezone.

pnorman commented 5 years ago

@pnorman really, even if the system is running on UTC so that nothing changes?

If the columns need to be rewritten then I think this will pretty much be a non-starter as we would need a downtime measured in hours if not days.

This is where I'd defer to a consultant as it's a reasonably well defined problem, and sufficiently advanced that it's outside my experience.

mmd-osm commented 5 years ago

The best transition plan is probably to force the timezone for the users those clients connect from, or to set appropriate libpq env vars to force the timezone.

I see this more as a hack than a real solution. in the long run, we would still require various code changes across different applications to make it work as seamless as today.

I realize that lots of people recommend switching to "with timezone". I'm not totally opposed to the change as such, but would prefer to see some well thought out strategy for its implementation.

mmd-osm commented 5 years ago

really, even if the system is running on UTC so that nothing changes?

There are a few pointers in the documentation and source code:

According to: https://doxygen.postgresql.org/tablecmds_8c_source.html#l09176 (points to relevant Postgres source code):

  * When the data type of a column is changed, a rewrite might not be required
  * if the new type is sufficiently identical to the old one, and the USING
  * clause isn't trying to insert some other value.  It's safe to skip the
  * rewrite if the old type is binary coercible to the new type, or if the
  * new type is an unconstrained domain over the old type.  In the case of a
  * constrained domain, we could get by with scanning the table and checking
  * the constraint rather than actually rewriting it, but we don't currently
  * try to do that.
  */

According to https://www.postgresql.org/docs/9.5/sql-altertable.html

Adding a column with a DEFAULT clause or changing the type of an existing column will require the entire table and its indexes to be rewritten. As an exception when changing the type of an existing column, if the USING clause does not change the column contents and the old type is either binary coercible to the new type or an unconstrained domain over the new type, a table rewrite is not needed; but any indexes on the affected columns must still be rebuilt.

My conclusion would be (TBC):

In a pure UTC-0 set up (no summer/winter time), the value should not change, timestamp & timestamptz are not constrained, hence no rewrite should be necessary. An index rebuild is necessary, though.
In a non-UTC set up, a rewrite is necessary, as the value changes (timestamp has to be converted to the local timezone). Index rebuild is also necessary.

NB: I'm assuming that the timestamp to timestamptz conversion routing needs to be called for billions of rows, only to find out that there's no change requiring a rewrite in case of UTC. I haven't found any description of an optimization, that would simply change the table metadata.

Why is the timestamp conversion not binary-coercible?

``` openstreetmap=# select * from pg_cast where castsource='timestamp'::regtype and casttarget='timestamptz'::regtype; castsource | casttarget | castfunc | castcontext | castmethod ------------+------------+----------+-------------+------------ 1114 | 1184 | 2028 | i | f (1 row) ``` According to https://www.postgresql.org/docs/9.5/catalog-pg-cast.html Field castmethod: Indicates how the cast is performed. - f means that the function specified in the castfunc field is used. - i means that the input/output functions are used. - b means that the types are binary-coercible, thus no conversion is required.

pnorman commented 5 years ago

I see this more as a hack than a real solution. in the long run, we would still require various code changes across different applications to make it work as seamless as today.

This change is to avoid the hack of forcing every client to specify the timezone. Right now you have to do that, in the future you won't if we make this change.

mmd-osm commented 5 years ago

It would be helpful to exactly name those issues. FWIW, I’m running rails, cgimap and osmosis diff replication on utc+1 & utc+2 and don’t have to mess around with time zones at all, it works out of the box.

I can only recommend to try this out for yourself.

Looking the rails port, you will notice that time is always treated as UTC (e.g. Time.now.getutc in many places), regardless of the system time zone. cgimap and osmosis basically just take that value, and attach a "Z" character at the end, so that a data consumer recognizes the value to be in UTC.

For the changeset upload i added in cgimap, I used "timestamp" timestamp without time zone NOT NULL DEFAULT (now() at time zone 'utc'),, which again populates the timestamp field with an UTC based value independent of the system timezone.

gravitystorm commented 5 years ago

it works out of the box.

Not quite - it appears to work out of the box. This is because the clients you list all assume that the data stored in the 'timestamp' columns is UTC, when it is actually 'local time'. Since the assumption is symmetric (i.e. the same mistake on writing and reading), you don't see the problem. That, however, is not sufficient to state that the problem doesn't exist. :-)

@pnorman, @gravitystorm : what would be your take on how to address the breakage of cgimap, osmosis, and other tools due to this change on a non-UTC system?

Sure, we need to fix any tools that already make broken assumptions. I'm not going to suggest we make changes that break anything, this whole topic is about fixing things properly. But this change will make the timestamp storage correct, and allow people to make future clients (or even trivial things, like calling now() from an SQL statement) work reliably without having to deal with this headache over and over for ever more.

If the columns need to be rewritten then I think this will pretty much be a non-starter as we would need a downtime measured in hours if not days.

We can't just say that we'll never change anything in our tables ever again! If this does involve large table changes, then we can work out what the zero-downtime migration path would be, and do that. We could practise multi-stage migrations on the smaller tables where there is less impact.

gravitystorm commented 5 years ago

Contrary to some earlier statements, cgimap shows incorrect timestamps now, in case the server isn't running on UTC+0.

OK, I mentioned to @tomhughes last week that I feared this might be the case and that it needs more investigation. From my initial investigations it appears that there isn't a version of to_char that takes timestampz, so it is most likely converting timestampz to timestamp and then doing the character formatting.

I think we would have to add at time zone 'UTC' to all statements for a correct timestamp then. This change however needs to be exactly in sync with changing the column type to "timestamp with time zone" to avoid incorrect timestamps on a non-UTC system.

If there's a way to avoid the synchronous changes, that would be worth looking into. For example, cgimap could try column type detection, or we can find a different time formatting function, or so on. Basically we should try to come up with a solution that allows cgimap to work with both timestamp and timestampz columns interchangeably.

mmd-osm commented 5 years ago

it appears to work out of the box. This is because the clients you list all assume that the data stored in the 'timestamp' columns is UTC, when it is actually 'local time'. Since the assumption is symmetric (i.e. the same mistake on writing and reading), you don't see the problem

Yes, that's pretty much in line with http://phili.pe/posts/timestamps-and-time-zones-in-postgresql/

The answer is semantics. Whenever you encounter a timestamptz, you know it denotes absolute time. It was stored in UTC. However, when you come across timestamp, you can’t possibly know in what time zone the timestamp is just by looking at it. It is merely a wall time. Maybe it’s UTC, or maybe the developer stored it in local time. Without additional context, you can’t be sure.

What we rely on here is the additional context.

Basically we should try to come up with a solution that allows cgimap to work with both timestamp and timestampz columns interchangeably.

Right, maybe we could set the timezone from within cgimap et al. via SET TIME ZONE 'UTC';, and see if this already fixes the issue.

I'm not exactly sure if this is what @pnorman proposed earlier, it sounded more like he wanted to control this from the outside via some environment parameter, which I don't find an ideal solution.

openstreetmap / openstreetmap-website