Closed bodymindarts closed 7 years ago
I think I'm seeing this as well on a standard VM.
This looks like a network connectivity issue. Can you verify that TCP/UDP ports are open and communication is flowing between all nodes over the port you've configured (default: 9638)?
This is running locally on docker for mac! No way its a connection issue.
Just made sure and can confirm that I am able to telnet from every node to every other node on port 9638.
(Note: I'm assuming the behavior I'm seeing is the same issue that @bodymindarts is reporting.)
I'm able to talk to my permanent peer's supervisor over HTTP as well as hitting the UDP port with netcat
. Census and butterfly data look like there are missing group members:
root@dev-postgres-0:~# curl -s localhost:9631/census | jq .[]
true
{
"postgresql.dev": {
"service_group": "postgresql.dev",
"election_status": "ElectionNoQuorum",
"update_election_status": "None",
"leader_id": null,
"service_config": null,
"local_member_id": "a529235923ab4ce38b3abf9f5413ff46",
"population": {
"a529235923ab4ce38b3abf9f5413ff46": {
"member_id": "a529235923ab4ce38b3abf9f5413ff46",
"pkg": {
"origin": "core",
"name": "postgresql",
"version": "9.6.3",
"release": "20170727171300"
},
"application": null,
"environment": null,
"service": "postgresql",
"group": "dev",
"org": null,
"initialized": false,
"persistent": false,
"leader": false,
"follower": false,
"update_leader": false,
"update_follower": false,
"election_is_running": false,
"election_is_no_quorum": true,
"election_is_finished": false,
"update_election_is_running": false,
"update_election_is_no_quorum": false,
"update_election_is_finished": false,
"alive": true,
"suspect": false,
"confirmed": false,
"departed": false,
"sys": {
"ip": "10.224.74.58",
"hostname": "dev-postgres-0",
"gossip_ip": "0.0.0.0",
"gossip_port": 9638,
"http_gateway_ip": "0.0.0.0",
"http_gateway_port": 9631
},
"cfg": {
"port": "5432",
"superuser_name": "admin",
"superuser_password": "admin"
}
}
},
"update_leader_id": null,
"changed_service_files": [],
"service_files": {}
}
}
"a529235923ab4ce38b3abf9f5413ff46"
1
1
0
0
0
0
root@dev-postgres-0:~# curl -s localhost:9631/butterfly | jq .[]
{
"members": {},
"health": {},
"update_counter": 0
}
{
"list": {
"postgresql.dev": {
"a529235923ab4ce38b3abf9f5413ff46": {
"type": 2,
"tag": [],
"from_id": "a529235923ab4ce38b3abf9f5413ff46",
"service": {
"member_id": "a529235923ab4ce38b3abf9f5413ff46",
"service_group": "postgresql.dev",
"package": "core/postgresql/9.6.3/20170727171300",
"incarnation": 1,
"cfg": {
"port": "5432",
"superuser_name": "admin",
"superuser_password": "admin"
},
"sys": {
"ip": "10.224.74.58",
"hostname": "dev-postgres-0",
"gossip_ip": "0.0.0.0",
"gossip_port": 9638,
"http_gateway_ip": "0.0.0.0",
"http_gateway_port": 9631
},
"initialized": false
}
}
}
},
"update_counter": 1
}
{
"list": {},
"update_counter": 0
}
{
"list": {},
"update_counter": 0
}
{
"list": {
"postgresql.dev": {
"election": {
"type": 3,
"tag": [],
"from_id": "a529235923ab4ce38b3abf9f5413ff46",
"election": {
"member_id": "a529235923ab4ce38b3abf9f5413ff46",
"service_group": "postgresql.dev",
"term": 0,
"suitability": 0,
"status": 2,
"votes": [
"a529235923ab4ce38b3abf9f5413ff46"
]
}
}
}
},
"update_counter": 1
}
{
"list": {},
"update_counter": 0
}
{
"list": {},
"update_counter": 0
}
I'm pretty sure this is because the postgresql plan in master is broken - it blocks the supervisor's main thread by calling the hab
binary from within it's post-run hook.
There are two rules in hooks:
run
hab
from within a hookOkay that might be the issue. This used to work on an earlier version. I'll see if I can fix PG to take those 'rules' into account.
Gonna close this since it's the postgresql plan. This is the link to the ticket we're working on to enable multiple services per package: https://github.com/habitat-sh/habitat/issues/2902
I can confirm that this issue doesn't come up when no sidecar process is being run.
I have this docker-compose file with 1 standalone postgres node (group postgresql.default) and a 3 node cluster (group postgresql.cluster) that gets bootstrapped by peering with the standalone node. The containers that get brought up are built using the latest habitat
0.29.1
The bootstrapping and clustering works but when querying the standalone node the cluster nodes aren't visible in the returned census data.
To reproduce run the following in this folder: