jackc / tern

The SQL Fan's Migrator
MIT License
1k stars 71 forks source link

Do not use a schema for expressing the schema version table #90

Closed AnyCPU closed 11 months ago

AnyCPU commented 12 months ago

An issue is introduced thanks to https://github.com/jackc/tern/issues/18 Why? Because the public schema is literally hardcoded. The hardcode is the evil.

First of all, the evil 18 invalidates following well-selling statement: If all database settings are supplied by PG* environment variables or program arguments the config file is not required.

Second, a flow provided in the evil 18 is very strange:

1) First tern run creates schema_version in public
2) You run a migration that creates a schema with the same name as the user
3) Next run of tern now creates schema_version in $user
4) tern now attempts to re-apply migrations from 0, potentially causing havoc

A pipeline that manages a flow from 1 to 4 must be aware of a strategy of schema migrations in a project. There is more than one strategy is possible in real world and people use them, for instance: a one schema version (structure. layout) for several schemas, a one schema layout per a schema in a one database etc.

The schema version table is named as schema version, it is intended to act in following way: "a one schema layout per a schema in a one database".

Ok, this is just fine. If I want to be more specific and I want to apply a one layout to whole database I will set a schema where is the schema version table located.

Third, another common case is that a user has no rights to create any in the public schema.

Fourth, the public schema may not exist at all.

Cases above are not rare, pretty often you work with existing db that is secured or you have to keep some data safe.

In general I could have configured a db connection by using only PG* standard variables and search_path option, but the evil 18 involves additional work.

If you work just in the public schema then you do not need to set a full schema version table name.

If you do the same that is provided by the evil 18's case, you are already doing pretty specific additional work.

After changes introduced in the evil 18 PR - let's do it everybody.

P.S: I'm ok with the ensureSchemaVersionTableExists part.

AnyCPU commented 12 months ago

Let people be more specific when they need that.

AnyCPU commented 12 months ago

I took a look at the ensureSchemaVersionTableExists. And it is also strange because there are combined two different approaches: hardcoded public schema and searching for the schema version table in private schemas. It looks complex.

As simplest solution I would like to see and use is default behavior provided by Postgres itself, and if it is really need only then I would like to specify custom tern's option in a config file - version_table.

AnyCPU commented 12 months ago

I understand that a lot of people use existing tern behavior "as is", and I would like to revert above changes back in some following version of the tern, but I guess this is not acceptable because of widely use.

jackc commented 11 months ago

I'm not sure I follow all what you are saying, but the schema of the version table is not hard coded. The default value is public.schema_version, but you are free to change that via the config file or CLI argument (or the Config struct if you are embedding tern as a library in your own application). Set the version table config option to schema_version and it should behave like you want.

AnyCPU commented 11 months ago

@jackc main trouble is that default behavior of PG* is altered by changes mentioned above.

This public.schema_version is a qualified table name, just a table name is much better.

This If all database settings are supplied by PG* environment variables or program arguments the config file is not required is not true.

Yes, I know I have to use terns configuration options explicitly.

jackc commented 11 months ago

I see how the current state of things could be a problem. But the original issue from #18 was a problem as well... 🤷‍♂️

But either way, at this point changing it would be a breaking change, and I'd like to avoid those if at all possible.

AnyCPU commented 11 months ago

Not only #18 has introduced altered behavior. There are two issues at least in git history. Chances are that at some point of time in future every similar new fix will be a breaking change. Specific #18 issue could have been fixed by making changes in configuration file, not in source code itself. It looks related issues are going to be long lasting ones. Not a problem for a new fork. At the moment I would like to see updated readme.md file with more correct statement about env vars and config file.