mesosphere-backup / terraform-dcos

DC/OS Terraform Installation and Upgrading Scripts
Apache License 2.0
62 stars 64 forks source link

Public Agent Not Reachable #52

Closed Jeeppler closed 6 years ago

Jeeppler commented 6 years ago

I try to create a full DC/OS cluster the Elastic Load Balancer (ELB) responsible for the public agent(s) reports that the public agent is out of service. I tried it with:

the issue on the public agent is, that the health check to the public agent fails (elb health check: HTTP:9090/_haproxy_health_check). Which is not surprising, because nothing runs on port 9090 on the health check:

$ netstat -tulpn
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 127.0.0.1:62080         0.0.0.0:*               LISTEN      -                   
tcp        0      0 0.0.0.0:62501           0.0.0.0:*               LISTEN      -                   
tcp        0      0 0.0.0.0:61001           0.0.0.0:*               LISTEN      -                   
tcp        0      0 10.0.2.60:61420         0.0.0.0:*               LISTEN      -                   
tcp        0      0 198.51.100.3:53         0.0.0.0:*               LISTEN      -                   
tcp        0      0 198.51.100.2:53         0.0.0.0:*               LISTEN      -                   
tcp        0      0 198.51.100.1:53         0.0.0.0:*               LISTEN      -                   
tcp        0      0 10.0.2.60:5051          0.0.0.0:*               LISTEN      -                   
tcp        0      0 127.0.0.1:8124          0.0.0.0:*               LISTEN      -                   
tcp6       0      0 :::61091                :::*                    LISTEN      -                   
tcp6       0      0 fd01:d::c633:6401:53    :::*                    LISTEN      -                   
tcp6       0      0 :::22                   :::*                    LISTEN      -                   
udp        0      0 0.0.0.0:64000           0.0.0.0:*                           -                   
udp        0      0 198.51.100.3:53         0.0.0.0:*                           -                   
udp        0      0 198.51.100.2:53         0.0.0.0:*                           -                   
udp        0      0 198.51.100.1:53         0.0.0.0:*                           -                   
udp        0      0 10.0.2.60:68            0.0.0.0:*                           -                   
udp6       0      0 fd01:d::c633:6401:53    :::*                                -                   
udp6       0      0 fe80::2886:3fff:fe8:546 :::*                                -                   
udp6       0      0 fe80::814:c5ff:fe8d:546 :::*                                -                   
udp6       0      0 fe80::1870:4eff:fef:546 :::*                                -                   
udp6       0      0 fe80::584c:8bff:fe8:546 :::*                                -                   
udp6       0      0 fe80::8476:74ff:feb:546 :::*                                - 
rimusz commented 6 years ago

that's correct, install marathon-lb there and checks will pass

rimusz commented 6 years ago

hmm, maybe not, something is missing, installed marathon-lb, public agents ELB health is failing

$ netstat -tulpn
(No info could be read for "-p": geteuid()=1000 but you should be root.)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 10.0.0.234:61420        0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      -
tcp        0      0 10.0.0.234:45141        0.0.0.0:*               LISTEN      -
tcp        0      0 198.51.100.3:53         0.0.0.0:*               LISTEN      -
tcp        0      0 198.51.100.2:53         0.0.0.0:*               LISTEN      -
tcp        0      0 198.51.100.1:53         0.0.0.0:*               LISTEN      -
tcp        0      0 127.0.0.1:36214         0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      -
tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:443             0.0.0.0:*               LISTEN      -
tcp        0      0 10.0.0.234:5051         0.0.0.0:*               LISTEN      -
tcp        0      0 127.0.0.1:8124          0.0.0.0:*               LISTEN      -
tcp        0      0 127.0.0.1:62080         0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:9090            0.0.0.0:*               LISTEN      -
Jeeppler commented 6 years ago

I tried it again, after @rimusz pointed out the marathon-lb has to be installed. However, the issue is still, that the health check executed by the public agent ELB fails, even though I can reach the public node by using the IP on port 9090 and path _haproxy_health_check.

Jeeppler commented 6 years ago

My public agent ELB health check fails, but apparently the traffic reaches the destination.

raditsp commented 6 years ago

I also get this problem, any idea why it is failing?

glynternet commented 6 years ago

Hi all,

Ensure that the security groups attached to your ELB allow TCP traffic outgoing on port 9090, otherwise it won't be able to reach the instance(s).