google / seesaw

Seesaw v2 is a Linux Virtual Server (LVS) based load balancing platform.
Apache License 2.0
5.65k stars 511 forks source link

Question: problem with traffic not flowing through to the backend #25

Closed lasse-aagren closed 8 years ago

lasse-aagren commented 8 years ago

Hi,

I'm trying to setup op a simple seesaw cluster, with one virtual ip, one vserver and two backends to loadbalacnce port 80 http traffic.

My problem is that everything seems like it is running as it should, but the loadbalancer doesn't seem to relay anything to the backends.

I have two servers with two NICs.

lb1.mydomain.com and lb2.mydomain.com

this is the seesaw.cfg from both servers:

lb1: cat /etc/seesaw/seesaw.cfg

[cluster]
anycast_enabled = false
name = lb1
node_ipv4 = 10.38.8.33
peer_ipv4 = 10.38.8.39
vip_ipv4 = 10.38.8.50

[interface]
node = eth0
lb = eth1

lb2: cat /etc/seesaw/seesaw.cfg

[cluster]
anycast_enabled = false
name = lb2
node_ipv4 = 10.38.8.39
peer_ipv4 = 10.38.8.33
vip_ipv4 = 10.38.8.50

[interface]
node = eth0
lb = eth1

This the cluster.pb:

seesaw_vip: <
  fqdn: "seesaw-vip.localdomain."
  ipv4: "10.38.8.50/24"
  status: PRODUCTION
>
node: <
  fqdn: "lb1.mydomain.com"
  ipv4: "10.38.8.33/24"
  status: PRODUCTION
>
node: <
  fqdn: "lb2.mydomain.com"
  ipv4: "10.38.8.39/24"
  status: PRODUCTION
>
vserver: <
  name: "test-vserver"
  entry_address: <
    fqdn: "lb-test.localdomain."
    ipv4: "10.38.8.70/24"
    status: PRODUCTION
  >
  rp: "admin@localdomain"
  vserver_entry: <
    protocol: TCP
    port: 80
    scheduler: RR
    healthcheck: <
      type: TCP
      port: 80
      tls_verify: false
    >
  >
  backend: <
    host: <
      fqdn: "mailrelay1.mydomain.com."
      ipv4: "10.38.8.32/24"
      status: PRODUCTION
    >
    weight: 1
  >
  backend: <
    host: <
      fqdn: "mailrelay2.mydomain.com."
      ipv4: "10.38.8.37/24"
      status: PRODUCTION
    >
    weight: 1
  >
>

seesaw reports this:

seesaw -c "show vservers"
Vserver
  Name:                test-vserver
  Hostname:            lb-test.localdomain.
  Status:              enabled (override state default; config state enabled)
  IPv4 Address:        10.38.8.70/24
  IPv6 Address:        <not configured>

  Services:

    IPv4 TCP/80    (DSR, rr scheduler)
        State:       enabled, healthy, active
        Watermarks:  Low 0.00, High 0.00, Currently 1.00

seesaw -c "show backends"
Backends
[   1] mailrelay1.mydomain.one.com.
[   2] mailrelay2.mydomain.one.com.

seesaw -c "show destinations"
Destinations
[   1] test-vserver/10.38.8.32:80/TCP (enabled, healthy, active)
[   2] test-vserver/10.38.8.37:80/TCP (enabled, healthy, active)

seesaw -c "show ha"
HA Status
  State:               Master
  Duration:            1m57s (since Aug 8 11:04:25 UTC)
  Transitions:         2
  Advertisements Sent: 230
  Advertisements Rcvd: 0
  Last Update:         Aug 8 11:06:20 UTC

seesaw -c "show nodes"
Nodes
[1] lb1.mydomain.one.com enabled
[2] lb2.mydomain.one.com enabled

Other info from lb1:

ip route
default via 10.38.8.1 dev eth0 
10.38.8.0/24 dev eth0  proto kernel  scope link  src 10.38.8.33 
10.38.8.0/24 dev eth1  proto kernel  scope link  src 10.38.8.50 

ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether ca:66:a4:02:dc:98 brd ff:ff:ff:ff:ff:ff
3: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN qlen 1000
    link/ether de:15:b8:15:94:3f brd ff:ff:ff:ff:ff:ff
    inet6 fe80::dc15:b8ff:fe15:943f/64 scope link 
       valid_lft forever preferred_lft forever
4: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 54:9f:35:fe:42:2e brd ff:ff:ff:ff:ff:ff
    inet 10.38.8.33/24 brd 10.38.8.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::569f:35ff:fefe:422e/64 scope link 
       valid_lft forever preferred_lft forever
5: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:00:5e:00:01:3c brd ff:ff:ff:ff:ff:ff
    inet 10.38.8.50/24 brd 10.38.8.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet 10.38.8.70/24 brd 10.38.8.255 scope global secondary eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::200:5eff:fe00:13c/64 scope link 
       valid_lft forever preferred_lft forever

ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.38.8.70:80 rr
  -> 10.38.8.32:80                Route   1      0          0         
  -> 10.38.8.37:80                Route   1      0          0       

But when I from lb1 (or other hosts on the same network) tries a nc -zv 10.38.8.70 80 it just hangs forevever (or until it times out). I can ping 10.38.8.70 just fine, but tcp traffic to port 80 doesn't even seem to hit the iptables INPUT chain created by seesaw.

Any ideas would be very much appreciated

4a6f656c commented 8 years ago

My first guess is that you've created a DSR blackhole - the default load balancing mode is to use DSR, however this requires that the VIP address (10.38.8.70 in this case), be configured as an IP on a dummy (non-ARP) interface on each of the backends. On most Linux systems you should be able to do something like ip addr add 10.38.8.70 dev dummy0 (you may need to insmod dummy first).

You can avoid DSR blackholes by using DSR healthchecks - add a mode: dsr to the healthcheck that you have configured, at which point I suspect the backends will be considered unhealthy (as a rule of thumb, if you have a DSR configured vserver, you should have one DSR healthcheck for exactly this reason).

Alternatively, if you don't need to use (or don't want to use DSR), you can switch to NAT mode by adding a mode: nat entry under the VServerEntry section of the protobuf.

lasse-aagren commented 8 years ago

That was exactly it :) thanks!