openstreetmap / openstreetmap-website

The Rails application that powers OpenStreetMap
https://www.openstreetmap.org/
GNU General Public License v2.0
2.19k stars 912 forks source link

Align structure.sql with generated output #4298

Open gravitystorm opened 1 year ago

gravitystorm commented 1 year ago

There are a few small differences between our structure.sql file, and the output generated from rails schema dumping on my system. I'm not sure whether this is because of different rails versions or different postgresql versions used between the time that these bits of the file were generated, and today. But it's a bit annoying when trying to develop migrations, so if it's possible for us to resolve them it'll make our developers lives a little bit easier.

On my own system, I get four main differences:

I'm interested to know, before I make a pull request, whether my experience is typical. You can test yourself by running bundle exec rails db:schema:dump followed by git diff to see what the differences are on your own system between the schema dump and what we have in structure.sql.

tomhughes commented 1 year ago

I have loads of differences for things like that (which I think is probably down to postgres version) or field ordering (which is mostly down to me having applied and reverted migrations at various times and in various orders) and I generally figure that trying to "fix" them is as likely to break them for other people as it is to fix them for me... My current diff is:

diff --git a/db/structure.sql b/db/structure.sql
index 939799c0a..78eef205f 100644
--- a/db/structure.sql
+++ b/db/structure.sql
@@ -9,6 +9,13 @@ SET xmloption = content;
 SET client_min_messages = warning;
 SET row_security = off;

+--
+-- Name: public; Type: SCHEMA; Schema: -; Owner: -
+--
+
+-- *not* creating schema, since initdb creates it
+
+
 --
 -- Name: btree_gist; Type: EXTENSION; Schema: -; Owner: -
 --
@@ -16,6 +23,13 @@ SET row_security = off;
 CREATE EXTENSION IF NOT EXISTS btree_gist WITH SCHEMA public;

+--
+-- Name: EXTENSION btree_gist; Type: COMMENT; Schema: -; Owner: -
+--
+
+COMMENT ON EXTENSION btree_gist IS 'support for indexing common datatypes in GiST';
+
+
 --
 -- Name: format_enum; Type: TYPE; Schema: public; Owner: -
 --
@@ -107,6 +121,39 @@ CREATE TYPE public.user_status_enum AS ENUM (
     'deleted'
 );

+
+--
+-- Name: tile_for_point(integer, integer); Type: FUNCTION; Schema: public; Owner: -
+--
+
+CREATE FUNCTION public.tile_for_point(scaled_lat integer, scaled_lon integer) RETURNS bigint
+    LANGUAGE plpgsql IMMUTABLE
+    AS $$
+DECLARE
+  x int8; -- quantized x from lon,
+  y int8; -- quantized y from lat,
+BEGIN
+  x := round(((scaled_lon / 10000000.0) + 180.0) * 65535.0 / 360.0);
+  y := round(((scaled_lat / 10000000.0) +  90.0) * 65535.0 / 180.0);
+
+  -- these bit-masks are special numbers used in the bit interleaving algorithm.
+  -- see https://graphics.stanford.edu/~seander/bithacks.html#InterleaveBMN
+  -- for the original algorithm and more details.
+  x := (x | (x << 8)) &   16711935; -- 0x00FF00FF
+  x := (x | (x << 4)) &  252645135; -- 0x0F0F0F0F
+  x := (x | (x << 2)) &  858993459; -- 0x33333333
+  x := (x | (x << 1)) & 1431655765; -- 0x55555555
+
+  y := (y | (y << 8)) &   16711935; -- 0x00FF00FF
+  y := (y | (y << 4)) &  252645135; -- 0x0F0F0F0F
+  y := (y | (y << 2)) &  858993459; -- 0x33333333
+  y := (y | (y << 1)) & 1431655765; -- 0x55555555
+
+  RETURN (x << 1) | y;
+END;
+$$;
+
+
 SET default_tablespace = '';

 SET default_table_access_method = heap;
@@ -250,8 +297,8 @@ ALTER SEQUENCE public.active_storage_variant_records_id_seq OWNED BY public.acti
 CREATE TABLE public.ar_internal_metadata (
     key character varying NOT NULL,
     value character varying,
-    created_at timestamp(6) without time zone NOT NULL,
-    updated_at timestamp(6) without time zone NOT NULL
+    created_at timestamp without time zone NOT NULL,
+    updated_at timestamp without time zone NOT NULL
 );

@@ -274,7 +321,6 @@ CREATE TABLE public.changeset_comments (
 --

 CREATE SEQUENCE public.changeset_comments_id_seq
-    AS integer
     START WITH 1
     INCREMENT BY 1
     NO MINVALUE
@@ -376,7 +422,6 @@ CREATE TABLE public.client_applications (
 --

 CREATE SEQUENCE public.client_applications_id_seq
-    AS integer
     START WITH 1
     INCREMENT BY 1
     NO MINVALUE
@@ -802,7 +847,6 @@ CREATE TABLE public.issue_comments (
 --

 CREATE SEQUENCE public.issue_comments_id_seq
-    AS integer
     START WITH 1
     INCREMENT BY 1
     NO MINVALUE
@@ -842,7 +886,6 @@ CREATE TABLE public.issues (
 --

 CREATE SEQUENCE public.issues_id_seq
-    AS integer
     START WITH 1
     INCREMENT BY 1
     NO MINVALUE
@@ -955,7 +998,6 @@ CREATE TABLE public.note_comments (
 --

 CREATE SEQUENCE public.note_comments_id_seq
-    AS integer
     START WITH 1
     INCREMENT BY 1
     NO MINVALUE
@@ -991,7 +1033,6 @@ CREATE TABLE public.notes (
 --

 CREATE SEQUENCE public.notes_id_seq
-    AS integer
     START WITH 1
     INCREMENT BY 1
     NO MINVALUE
@@ -1137,7 +1178,6 @@ CREATE TABLE public.oauth_nonces (
 --

 CREATE SEQUENCE public.oauth_nonces_id_seq
-    AS integer
     START WITH 1
     INCREMENT BY 1
     NO MINVALUE
@@ -1216,7 +1256,6 @@ CREATE TABLE public.oauth_tokens (
 --

 CREATE SEQUENCE public.oauth_tokens_id_seq
-    AS integer
     START WITH 1
     INCREMENT BY 1
     NO MINVALUE
@@ -1251,7 +1290,6 @@ CREATE TABLE public.redactions (
 --

 CREATE SEQUENCE public.redactions_id_seq
-    AS integer
     START WITH 1
     INCREMENT BY 1
     NO MINVALUE
@@ -1326,7 +1364,6 @@ CREATE TABLE public.reports (
 --

 CREATE SEQUENCE public.reports_id_seq
-    AS integer
     START WITH 1
     INCREMENT BY 1
     NO MINVALUE
@@ -1373,7 +1410,6 @@ CREATE TABLE public.user_blocks (
 --

 CREATE SEQUENCE public.user_blocks_id_seq
-    AS integer
     START WITH 1
     INCREMENT BY 1
     NO MINVALUE
@@ -1406,9 +1442,9 @@ CREATE TABLE public.user_preferences (
 CREATE TABLE public.user_roles (
     id integer NOT NULL,
     user_id bigint NOT NULL,
-    role public.user_role_enum NOT NULL,
     created_at timestamp without time zone,
     updated_at timestamp without time zone,
+    role public.user_role_enum NOT NULL,
     granter_id bigint NOT NULL
 );

@@ -1418,7 +1454,6 @@ CREATE TABLE public.user_roles (
 --

 CREATE SEQUENCE public.user_roles_id_seq
-    AS integer
     START WITH 1
     INCREMENT BY 1
     NO MINVALUE
@@ -3512,3 +3547,5 @@ INSERT INTO "schema_migrations" (version) VALUES
 ('7'),
 ('8'),
 ('9');
+
+
gravitystorm commented 10 months ago

For the timestamp(6) on ar_internal_metadata, the structure.sql should continue to use timestamp(6).

This is because for rails 6.0+, timestamp(6) is the default for timestamp columns. (default vs override for old versions) All of our migrations are correctly labelled with which AR version to use, so tables before that are defined with timestamp, tables after that with timestamp(6).

The edge case is that ar_internal_metadata table is not controlled, so gets the defaults at the moment it was created, which varies from system to system. On my laptop that was before rails 6, so I have timestamp. Any developer who first ran the code after November 2019 will have timestamp(6).

I've updated my development machine, by running:

alter table ar_internal_metadata alter column created_at set data type timestamp(6) without time zone;
alter table ar_internal_metadata alter column updated_at set data type timestamp(6) without time zone;

I would encourage any other veteran developers to run the same update, to avoid noise when working on database migrations.

gravitystorm commented 10 months ago
  • The schema_migrations table has a wildly different order, but I'll ignore that for now

I found that my schema_migrations values for legacy migrations (before we started using timestamps) were stored with leading zeros, which was messing up the ordering. So to fix this I ran:

update schema_migrations set version = TRIM(LEADING '0' FROM version);

and that sorted it out. I haven't managed to uncover why my local machine has those leading zeros, there's nothing obvious in the rails history as to when that might have changed. :man_shrugging:

gravitystorm commented 10 months ago

The AS integer bit is super interesting and surprisingly complicated!

We have 9 tables with integer id columns, where if you created the table on postgres 9 (veteran dev, production) the sequence will have the bigint type, but other people (new devs, CI) will have the sequence with integer type, which is also what we have in structure.sql. These 9 sequences are:

There are 3 further tables that have an additional complication:

These three tables are:

I need to have a think about what the best thing to do is.

For the first set of 9, I'm tempted to just run "ALTER SEQUENCE foo_seq AS integer;" locally, since then my machine matches CI and new devs, and although production will still be bigint I can't see any harm in that. Alternatively, we could instead upgrade the affected tables to have bigint ids, for consistency if nothing else.

For the second set of 3, it's potentially confusing to new developers that the sequences are only integer despite the primary keys being bigint, and they might worry that it's also bug in production (without being able to know that production has postgres-9-era bigint sequences). It's also a gotcha for anyone else who deploys the code, since they will have a "for real" mismatch lurking in their production database. For these tables, it might be worth either adding migrations to alter the sequences to bigint (which will make no difference in production, but will fix everything else).