sorintlab / stolon

PostgreSQL cloud native High Availability and more.
https://talk.stolon.io
Apache License 2.0
4.66k stars 447 forks source link

is it possible to initialize a cluster from a pg_basebackup of a remote server? #263

Closed theothermike closed 6 years ago

theothermike commented 7 years ago

We're about to migrate to stolon to one of our prod clusters from an existing psql database running in a VM, but the previous method I've been using (pg_dump/pg_restore) is time prohibitive since it is a large database (50+ gb)

I've been playing with using pg_basebackup instead which is very quick, but I can't for the life of me get stolon to initialize properly when this has been placed in $STOLON_DATA before the keeper starts up

Any suggestions? Full details would be appreciated

sgotti commented 7 years ago

@theothermike I'm not sure what you're trying to achieve and I'm missing the steps you're doing. Are you following the doc on point in time recovery here: https://github.com/sorintlab/stolon/blob/master/doc/README.md ?

If so and you're having problem doing that steps I'll suggest writing to the stolon mailing list sending the used cluster config, the stolon keeper and postgres logs ( https://github.com/sorintlab/stolon#contacts ) since it's easier to do support than the githib issue tracker.

Please also look at other similar questions: #256 #250 and https://groups.google.com/forum/m/#!topic/stolon/W9Whndt6pVs

theothermike commented 7 years ago

Some context:

We don't currently use wal-e (and it doesn't appear it's included in the postgres image that stolon is using to build with either?).

We'd simply like to be able to migrate our existing postgres database (which has another superuser 'postgres') into the new stolon deployment with minimal downtime.

Previously, was simply doing a pg_dump of the desired database running on our old VM, minor 'sed' to change superuser, and then pg_restore'ing it to the stolon cluster, which worked like a charm, but due to the size now would result in too much downtime

I'll keep playing with PITR. Since we are on google infrastructure, I assume will have to get wal-e working, archive to a shared gs:// bucket, and then restore using pitr in stolon from that bucket

sgotti commented 7 years ago

Stolon was designed to work with any backup/restore solution, if you already have a solution working with a standalone postgres you can probably do the same with stolon without any change. See: https://github.com/sorintlab/stolon/blob/master/doc/pitr.md You just have to define an initmode and config where you define a backup restore command to restore the basebackup and the recovery command (the same that you'll put in the recovery.conf but in this case the recovery.conf is generated by stolon)

But in your case the best solution will be, when implemented, #157 (will let you achieve near zero downtime)

Until it's implemented you can do something like this:

Another more convoluted solution will be to stolonize a standalone standby using the "existing" init mode.

theothermike commented 7 years ago

Thanks for the info. I was able to get it running manually with a pg_basebackup by:

but we have a requirement of adding a non default postgres config: 'max_prepared_transactions', which previously I was able to set using pgParameter.

It appears combining pgParameters and pitr does not work.

here's what I'm using for stolonctl init

{"initMode":"pitr", "pgParameters":{"max_prepared_transactions":"100"}, "pitrConfig":{"dataRestoreCommand":"mkdir -p /stolon-data/tmp/ && PGPASSWORD=password pg_basebackup -D /stolon-data/postgres/ -P -h olddbhost -U postgres && touch /stolon-data/postgres/pg_ident.conf && echo -e \"local all all md5\nhost all all 0.0.0.0/0 md5\nhost all all ::0/0 md5\nhost replication repluser 0.0.0.0/0 md5\nhost replication repluser ::0/0 md5\"", "archiveRecoverySettings":{"restoreCommand": "cp /stolon-data/tmp/%f \"%p\""}}}

Do you think that's a quick fix, or can you suggest a strategy to add that custom server config as part of this flow?

theothermike commented 7 years ago

Using a 'stolonctl update --patch' with the required param right after the init puts the required config in place, and allows it to start

sgotti commented 7 years ago

Thanks for the info. I was able to get it running manually with a pg_basebackup by:

adding pg_hba.conf manually (copied from another vanilla stolon pod) touching pg_ident.conf

Can you describe the reason why you had to change them?

Using a 'stolonctl update --patch' with the required param right after the init puts the required config in place, and allows it to start

Glad it worked, but the specified pgParameters during the pitr init mode should have been merged with the one provided by the restored postgresql.conf. If this didn't happened it should probably be fixed? Can you please open a new issue with the steps to reproduce it so we can add an integration test and fix it?

theothermike commented 7 years ago

The pg_hba and pg_ident files were a bit of a red herring. I needed them in place to manually start up Postgres inside the container in order to debug, but I realized later that Stolon Keeper provides these themselves.

In the end the main issues were:

  1. 'stolon' and 'repluser' need to exist on the legacy database you are migrating, with the same credentials (and replication privilege in order to execute the pg_basebackup)

I would suggest a mode for init called 'migration' which would let you connect with legacy superuser credentials, and then once it's finished copying the db to the keeper container, the keeper would start it up and then add the required users, change ownerships of objects (optional?) and drop the legacy user if no longer needed. All manual steps right now. I can make this a feature request if you'd like (I am also planning a PR with complete procedure for everyone elses benefits).

  1. It doesn't appear that logging to stderr/stdout/logfile for postgres is enabled in the official docker image, nor does Stolon provide a configuration option for that in postgresql.conf.

Yes it can be added manually with log_destination/logging_collector/log_directory configs, but really this should default on, and would have saved a lot of headaches for me during this migration

  1. The db we were migrating had non standard server settings (max_prepared_transactions, and max_connections) which required a stolonctl update --patch command to run after the 'pitr' init. I would suggest 'pitr' also supports providing pgParameters in order to simplify this.

When you do a 'pg_basebackup' Postgres specifically omits copying postgresql.conf (so no merging can happen), pg_hba.conf and pg_ident.conf

sgotti commented 7 years ago

'stolon' and 'repluser' need to exist on the legacy database you are migrating, with the same credentials (and replication privilege in order to execute the pg_basebackup) I would suggest a mode for init called 'migration' which would let you connect with legacy superuser credentials, and then once it's finished copying the db to the keeper container, the keeper would start it up and then add the required users, change ownerships of objects (optional?) and drop the legacy user if no longer needed. All manual steps right now. I can make this a feature request if you'd like (I am also planning a PR with complete procedure for everyone elses benefits).

Isn't all of this done during the backups and outside stolon? Stolon just execute a data restore command.

In your case I see that the dataRestoreCommand you are providing executes a pg_basebackup from the instance you want to replicate. So the user and password to connect to the original instance should be put there and stolon doesn't have to know them. Am I missing something?

It doesn't appear that logging to stderr/stdout/logfile for postgres is enabled in the official docker image, nor does Stolon provide a configuration option for that in postgresql.conf. Yes it can be added manually with log_destination/logging_collector/log_directory configs, but really this should default on, and would have saved a lot of headaches for me during this migration

These are postgresql.conf parameters so you can set them in the pgParameters of the initial cluster spec. Just tested setting them in the initial cluster spec and they work.

I'm not sure which are the default of the postgresql docker image we are using but if nothing is specified all the logs should be directed to stderr (https://www.postgresql.org/docs/9.6/static/runtime-config-logging.html).

The db we were migrating had non standard server settings (max_prepared_transactions, and max_connections) which required a stolonctl update --patch command to run after the 'pitr' init. I would suggest 'pitr' also supports providing pgParameters in order to simplify this. When you do a 'pg_basebackup' Postgres specifically omits copying postgresql.conf (so no merging can happen), pg_hba.conf and pg_ident.conf

I don't see this behavior:

I'd like to fix the issues you reported by I can't reproduce them and also the integration tests say that all looks working so I'll need a detailed reproducer the see which is the real issue.

theothermike commented 7 years ago

Isn't all of this done during the backups and outside stolon? Stolon just execute a data restore command.

In your case I see that the dataRestoreCommand you are providing executes a pg_basebackup from the instance you want to replicate. So the user and password to connect to the original instance should be put there and stolon doesn't have to know them. Am I missing something?

Well since it's a binary copy, only the original instance user (ie. postgres, or app specific one) exist in that database. When stolon tries to start it up, it is unable to use it since 'stolon' and 'repluser' dont exist. As mentioned, manually adding those to the original db gets around that. To note, we did the last pg_basebackup using the -x param, to also get the latest wal files (indeed we do not use wal-e)

I'm not sure which are the default of the postgresql docker image we are using but if nothing is specified all the logs should be directed to stderr (https://www.postgresql.org/docs/9.6/static/runtime-config-logging.html).

We're still on 9.4.5, that might explain the difference (not sure). However, we never saw any logs from postgres, only the keeper using kubectl logs, no postgresql logfiles inside the pod during this work. The sample postgresql.conf file from this version lists log_destination as stderr, but then lists logging_collector=off, which might explain it.

I don't see this behavior:

The omission of the config files might be an ubuntu (14-16 lts) thing. Ubuntu explicitly keeps those conf files into the /etc/postgresql//main dir, but keeps data files in /var/lib/postgresql/, which might explain why they dont ship with the pg_basebackup

my suggestions are meant to make future migrations easier for new users of Stolon, without first having to add users, change log configs, and figure out what missing configs need to be added to pgParameters. Over the next few days I will attempt to write a README for migrating from ubuntu and 9.4, summarizing everything from this ticket.

sgotti commented 6 years ago

Closing since this is already documented and tested.