sorintlab / stolon

PostgreSQL cloud native High Availability and more.
https://talk.stolon.io
Apache License 2.0
4.66k stars 447 forks source link

panic: no db convergence info for db "61f4baa2", this shouldn't happen! #408

Closed patsevanton closed 6 years ago

patsevanton commented 6 years ago

Hello! I try run cluster stolon winth PostgreSQL 9.6 on CentOS 7 Version: CentOS Linux release - 7.4.1708 (Core) etcd - 3.2.0-0.1.rc1.el7.centos stolon - 0.6.0-1.8420cb1.el7.centos

Please add boot order run command to https://github.com/sorintlab/stolon/blob/master/doc/simplecluster.md maybe I'm doing something wrong

1) stolonctl --cluster-name stolon-cluster --store-backend=etcd init - run on First server 2) stolon-sentinel --cluster-name stolon-cluster --store-backend=etcd - run on every servers 3) stolon-keeper --cluster-name stolon-cluster --store-backend=etcd --uid postgres0 --data-dir 9.6/data/postgres0 --pg-su-password=supassword --pg-repl-username=repluser --pg-repl-password=replpassword --pg-listen-address=127.0.0.1 - run on First server

[I] 2018-01-13T12:16:50Z keeper.go:1410: postgres parameters not changed [I] 2018-01-13T12:16:55Z keeper.go:1235: our db requested role is master [I] 2018-01-13T12:16:55Z keeper.go:1255: already master

[I] 2018-01-13T12:14:02Z sentinel.go:651: trying to find initial master [I] 2018-01-13T12:14:02Z sentinel.go:656: initializing cluster keeper=postgres0 [W] 2018-01-13T12:14:07Z sentinel.go:279: received db state for unexpected db uid receivedDB= db=61f4baa2 [I] 2018-01-13T12:14:07Z sentinel.go:695: waiting for db db=61f4baa2 keeper=postgres0 [I] 2018-01-13T12:14:12Z sentinel.go:695: waiting for db db=61f4baa2 keeper=postgres0 [I] 2018-01-13T12:14:17Z sentinel.go:681: db initialized db=61f4baa2 keeper=postgres0 [W] 2018-01-13T12:16:58Z sentinel.go:279: received db state for unexpected db uid receivedDB= db=61f4baa2 [W] 2018-01-13T12:17:03Z sentinel.go:279: received db state for unexpected db uid receivedDB= db=61f4baa2

4) stolon-keeper --cluster-name stolon-cluster --store-backend=etcd --uid postgres0 --data-dir 9.6/data/postgres0 --pg-su-password=supassword --pg-repl-username=repluser --pg-repl-password=replpassword --pg-listen-address=127.0.0.1 - run on Second server

-bash-4.2$ stolon-keeper --cluster-name stolon-cluster --store-backend=etcd --uid postgres0 --data-dir 9.6/data/postgres0 --pg-su-password=supassword --pg-repl-username=repluser --pg-repl-password=replpassword --pg-listen-address=127.0.0.1 [I] 2018-01-13T12:16:53Z keeper.go:1567: exclusive lock on data dir taken [I] 2018-01-13T12:16:53Z keeper.go:408: keeper uid uid=postgres0 [I] 2018-01-13T12:16:53Z postgresql.go:215: stopping database pg_ctl: directory "9.6/data/postgres0/postgres" does not exist [I] 2018-01-13T12:16:53Z keeper.go:839: our db boot UID is different than the cluster data one, waiting for it to be updated bootUUID=c8a15870-d182-40b8-a85d-0346fa9fd437 clusterBootUUID=c878ec31-81b8-4bd5-bb63-236f589cd460 [I] 2018-01-13T12:16:53Z postgresql.go:215: stopping database pg_ctl: directory "9.6/data/postgres0/postgres" does not exist [I] 2018-01-13T12:16:58Z keeper.go:839: our db boot UID is different than the cluster data one, waiting for it to be updated bootUUID=c8a15870-d182-40b8-a85d-0346fa9fd437 clusterBootUUID=c878ec31-81b8-4bd5-bb63-236f589cd460 [I] 2018-01-13T12:16:58Z postgresql.go:215: stopping database pg_ctl: directory "9.6/data/postgres0/postgres" does not exist [I] 2018-01-13T12:17:03Z keeper.go:903: current db UID different than cluster data db UID db= cdDB=61f4baa2 [I] 2018-01-13T12:17:08Z keeper.go:1219: database cluster not initialized [I] 2018-01-13T12:17:08Z keeper.go:1235: our db requested role is master [E] 2018-01-13T12:17:08Z keeper.go:1237: database cluster not initialized but requested role is master. This shouldn't happen! [I] 2018-01-13T12:17:13Z keeper.go:1219: database cluster not initialized [I] 2018-01-13T12:17:13Z keeper.go:1235: our db requested role is master

panic: no db convergence info for db "61f4baa2", this shouldn't happen!

goroutine 313 [running]: panic(0xac4fc0, 0xc82028d680) /usr/lib/golang/src/runtime/panic.go:481 +0x3e6 main.(Sentinel).dbConvergenceState(0xc8200d7550, 0xc8201d6c00, 0x6fc23ac00, 0xc8201b1040) /builddir/build/BUILD/go/src/github.com/sorintlab/stolon/gopath/src/github.com/sorintlab/stolon/cmd/sentinel/sentinel.go:1267 +0x1a2 main.(Sentinel).updateCluster(0xc8200d7550, 0xc820223500, 0x24, 0x0, 0x0) /builddir/build/BUILD/go/src/github.com/sorintlab/stolon/gopath/src/github.com/sorintlab/stolon/cmd/sentinel/sentinel.go:866 +0xfbf main.(Sentinel).clusterSentinelCheck(0xc8200d7550, 0x7f786eb2a268, 0xc82000be80) /builddir/build/BUILD/go/src/github.com/sorintlab/stolon/gopath/src/github.com/sorintlab/stolon/cmd/sentinel/sentinel.go:1508 +0x2df1 main.(Sentinel).Start.func1(0xc8200d7550, 0x7f786eb2a268, 0xc82000be80, 0xc82004a9c0) /builddir/build/BUILD/go/src/github.com/sorintlab/stolon/gopath/src/github.com/sorintlab/stolon/cmd/sentinel/sentinel.go:1408 +0x35 created by main.(*Sentinel).Start /builddir/build/BUILD/go/src/github.com/sorintlab/stolon/gopath/src/github.com/sorintlab/stolon/cmd/sentinel/sentinel.go:1410 +0x26f

sgotti commented 6 years ago

@patsevanton you're starting two keepers with the same uid and this is bad since the sentinel got confused and will cause data loss. See https://github.com/sorintlab/stolon/blob/master/doc/architecture.md#keepers

patsevanton commented 6 years ago

Thank you