openvswitch / ovs-issues

Issue tracker repo for Open vSwitch
10 stars 3 forks source link

Duplicated Kernel policies and security associations for OVN ipsec #171

Open oilbeater opened 5 years ago

oilbeater commented 5 years ago

I'm running ovs 2.11.1 and trying OVN ipsec with libreswan. But the network failed to connect each other across nodes, when I enable ipsec in ovn-nb.

What I noticed from ovs-appctl -t ovs-monitor-ipsec tunnels/show is that there are duplicated Kernel policies and Kernel security associations, as showed below.

Interface name: ovn-b80db0-0 v1 (CONFIGURED)
  Tunnel Type:    geneve
  Remote IP:      10.0.128.35
  SKB mark:       None
  Local cert:     /etc/openvswitch/cert.pem
  Local name:     42db392d-ca03-4e6d-965c-02a49a0711b0
  Local key:      /etc/openvswitch/privkey.pem
  Remote cert:    None
  Remote name:    b80db018-7970-4da6-8bc2-6942002e3d82
  CA cert:        /etc/openvswitch/cacert.pem
  PSK:            None
  Ofport:         1
  CFM state:      Disabled
Kernel policies installed:
  src 10.0.128.15/32 dst 10.0.128.35/32 proto udp sport 6081
  src 10.0.128.15/32 dst 10.0.128.35/32 proto udp sport 6081
Kernel security associations installed:
  sel src 10.0.128.35/32 dst 10.0.128.15/32
  sel src 10.0.128.35/32 dst 10.0.128.15/32 proto udp dport 6081
  sel src 10.0.128.15/32 dst 10.0.128.35/32 proto udp sport 6081
  sel src 10.0.128.35/32 dst 10.0.128.15/32
  sel src 10.0.128.35/32 dst 10.0.128.15/32
  sel src 10.0.128.35/32 dst 10.0.128.15/32
  sel src 10.0.128.35/32 dst 10.0.128.15/32
IPsec connections that are active:

Another thing I found is that every time I run command ovs-appctl -t ovs-monitor-ipsec refresh there will be another duplicated Kernel security associations list in ovs-appctl -t ovs-monitor-ipsec tunnels/show

Is that reasonable, or I need to change some ipsec or ovs config. Thanks!

ansisatteka commented 5 years ago

It seems you are reporting two issues here:

  1. accumulation of policies and security associations.
  2. ipsec tunnel not working

The accumulation of policies is NOT expected, but it should not cause any functional issues to best of my knowledge. I remember solving this problem for strongSwan backend. Probably it is not solved correctly for libreSwan backend. Possibly because LibreSwanHelper.refresh() function in ovs-monitor-ipsec.py calls ipsec utility incorrectly.

As for the tunnel not working properly. Can you check the auto-generated ipsec.conf and ipsec.secrets files? Also check libreswan log file (it may log it /var/log/auth.log)? The cause for this is that either ovs-monitor-ipsec did not populate them correctly or you have some other networking issue going on there.

Also, can you give more hints on what exact linux distro and libreswan versions you are using? I will try to reproduce locally the issue you are seeing.

oilbeater commented 5 years ago

The version I use is CentOS Linux release 7.6.1810 and Libreswan 3.25 (netkey) on 3.10.0-957.10.1.el7.x86_64

As I checked the files, ipsec.secrets is empty and there is no auth.log file

ansisatteka commented 5 years ago

Ok, just installed CentOS with libreswan. It seems you have to check /var/log/messages and then egrep for "pluto|ipsec|swan" to see why IPsec tunnel does not get established. The log files may have rotated so make sure that you upload log messages from around the time when issue happened.

Also, I see that you are using PKI with Certificate Authority configuration (this is the most complex way to set up IPsec):

Local cert: /etc/openvswitch/cert.pem Local key: /etc/openvswitch/privkey.pem CA cert: /etc/openvswitch/cacert.pem

So are you sure you have signed the certificates properly on both hosts by the same CA? Probably the log messages from /var/log/messages should describe if this is the case or not.

oilbeater commented 5 years ago

@ansisatteka found the log by journalctl -fu ipsec

May 09 15:00:10 ovn-master systemd[1]: Started Internet Key Exchange (IKE) Protocol Daemon for IPsec.
May 09 15:00:10 ovn-master pluto[9365]: adding interface docker0/docker0 172.17.0.1:500
May 09 15:00:10 ovn-master pluto[9365]: adding interface docker0/docker0 172.17.0.1:4500
May 09 15:00:10 ovn-master pluto[9365]: adding interface eth0/eth0 10.0.128.15:500
May 09 15:00:10 ovn-master pluto[9365]: adding interface eth0/eth0 10.0.128.15:4500
May 09 15:00:10 ovn-master pluto[9365]: adding interface lo/lo 127.0.0.1:500
May 09 15:00:10 ovn-master pluto[9365]: adding interface lo/lo 127.0.0.1:4500
May 09 15:00:10 ovn-master pluto[9365]: | setup callback for interface lo:4500 fd 20
May 09 15:00:10 ovn-master pluto[9365]: | setup callback for interface lo:500 fd 19
May 09 15:00:10 ovn-master pluto[9365]: | setup callback for interface eth0:4500 fd 18
May 09 15:00:10 ovn-master pluto[9365]: | setup callback for interface eth0:500 fd 17
May 09 15:00:10 ovn-master pluto[9365]: | setup callback for interface docker0:4500 fd 16
May 09 15:00:10 ovn-master pluto[9365]: | setup callback for interface docker0:500 fd 15
May 09 15:00:10 ovn-master pluto[9365]: loading secrets from "/etc/ipsec.secrets"
May 09 15:00:11 ovn-master pluto[9365]: loading secrets from "/etc/ipsec.secrets"
May 09 15:00:11 ovn-master pluto[9365]: added connection description "ovn-5cfcb1-0-in-1"
May 09 15:00:11 ovn-master pluto[9365]: "ovn-5cfcb1-0-in-1" #1: initiating v2 parent SA
May 09 15:00:11 ovn-master pluto[9365]: "ovn-5cfcb1-0-in-1" #1: local IKE proposals for ovn-5cfcb1-0-in-1 (IKE SA initiator selecting KE): 1:IKE:ENCR=AES_GCM_C_256;PRF=HMAC_SHA2_256;INTEG=NONE;DH=MODP2048
May 09 15:00:11 ovn-master pluto[9365]: "ovn-5cfcb1-0-in-1" #1: STATE_PARENT_I1: sent v2I1, expected v2R1
May 09 15:00:11 ovn-master pluto[9365]: "ovn-5cfcb1-0-in-1" #1: local ESP/AH proposals for ovn-5cfcb1-0-in-1 (IKE SA initiator emitting ESP/AH proposals): 1:ESP:ENCR=AES_GCM_C_256;INTEG=NONE;DH=NONE;ESN=DISABLED
May 09 15:00:11 ovn-master pluto[9365]: "ovn-5cfcb1-0-in-1" #2: STATE_PARENT_I2: sent v2I2, expected v2R2 {auth=IKEv2 cipher=aes_gcm_16_256 integ=n/a prf=sha2_256 group=MODP2048}
May 09 15:00:11 ovn-master pluto[9365]: added connection description "ovn-5cfcb1-0-out-1"
May 09 15:00:11 ovn-master pluto[9365]: "ovn-5cfcb1-0-in-1" #2: STATE_PARENT_I2: retransmission; will wait 0.5 seconds for response
May 09 15:00:12 ovn-master pluto[9365]: "ovn-5cfcb1-0-in-1" #2: STATE_PARENT_I2: retransmission; will wait 1 seconds for response
May 09 15:00:13 ovn-master pluto[9365]: "ovn-5cfcb1-0-in-1" #2: STATE_PARENT_I2: retransmission; will wait 2 seconds for response
May 09 15:00:15 ovn-master pluto[9365]: "ovn-5cfcb1-0-in-1" #2: STATE_PARENT_I2: retransmission; will wait 4 seconds for response
May 09 15:00:19 ovn-master pluto[9365]: "ovn-5cfcb1-0-in-1" #2: STATE_PARENT_I2: retransmission; will wait 8 seconds for response
May 09 15:00:24 ovn-master pluto[9365]: packet from 10.0.128.35:500: local IKE proposals for ovn-5cfcb1-0-out-1 (IKE SA responder matching remote proposals): 1:IKE:ENCR=AES_GCM_C_256;PRF=HMAC_SHA2_256;INTEG=NONE;DH=MODP2048
May 09 15:00:24 ovn-master pluto[9365]: packet from 10.0.128.35:500: proposal 1:IKE:ENCR=AES_GCM_C_256;PRF=HMAC_SHA2_256;DH=MODP2048 chosen from remote proposals 1:IKE:ENCR=AES_GCM_C_256;PRF=HMAC_SHA2_256;DH=MODP2048[first-match]
May 09 15:00:24 ovn-master pluto[9365]: "ovn-5cfcb1-0-out-1" #3: STATE_PARENT_R1: received v2I1, sent v2R1 {auth=IKEv2 cipher=aes_gcm_16_256 integ=n/a prf=sha2_256 group=MODP2048}
May 09 15:00:24 ovn-master pluto[9365]: "ovn-5cfcb1-0-out-1" #3: certificate verified OK: CN=5cfcb1d7-9a4c-4307-a333-5604a4c079d2,OU=Open vSwitch certifier,O=Open vSwitch,ST=CA,C=US
May 09 15:00:24 ovn-master pluto[9365]: "ovn-5cfcb1-0-out-1" #3: IKEv2 mode peer ID is ID_FQDN: '@5cfcb1d7-9a4c-4307-a333-5604a4c079d2'
May 09 15:00:24 ovn-master pluto[9365]: "ovn-5cfcb1-0-out-1" #3: Authenticated using RSA
May 09 15:00:24 ovn-master pluto[9365]: "ovn-5cfcb1-0-out-1" #3: local ESP/AH proposals for ovn-5cfcb1-0-out-1 (IKE SA responder matching remote ESP/AH proposals): 1:ESP:ENCR=AES_GCM_C_256;INTEG=NONE;DH=NONE;ESN=DISABLED
May 09 15:00:24 ovn-master pluto[9365]: "ovn-5cfcb1-0-out-1" #3: proposal 1:ESP:SPI=174c431c;ENCR=AES_GCM_C_256;ESN=DISABLED chosen from remote proposals 1:ESP:ENCR=AES_GCM_C_256;ESN=DISABLED[first-match]
May 09 15:00:24 ovn-master pluto[9365]: "ovn-5cfcb1-0-out-1" #4: negotiated connection [10.0.128.15-10.0.128.15:0-65535 17] -> [10.0.128.35-10.0.128.35:6081-6081 17]
May 09 15:00:24 ovn-master pluto[9365]: "ovn-5cfcb1-0-out-1" #4: STATE_V2_IPSEC_R: IPsec SA established transport mode {ESP=>0x174c431c <0xdf6cca44 xfrm=AES_GCM_16_256-NONE NATOA=none NATD=none DPD=passive}
May 09 15:00:24 ovn-master pluto[9365]: "ovn-5cfcb1-0-out-1" #3: local ESP/AH proposals for ovn-5cfcb1-0-out-1 (ESP/AH responder matching remote proposals): 1:ESP:ENCR=AES_GCM_C_256;INTEG=NONE;DH=MODP2048;ESN=DISABLED
May 09 15:00:24 ovn-master pluto[9365]: "ovn-5cfcb1-0-out-1" #3: proposal 1:ESP:SPI=b3e274ff;ENCR=AES_GCM_C_256;DH=MODP2048;ESN=DISABLED chosen from remote proposals 1:ESP:ENCR=AES_GCM_C_256;DH=MODP2048;ESN=DISABLED[first-match]
May 09 15:00:24 ovn-master pluto[9365]: "ovn-5cfcb1-0-out-1" #5: responding to CREATE_CHILD_SA message (ID 2) from 10.0.128.35:500 with encrypted notification TS_UNACCEPTABLE
May 09 15:00:27 ovn-master pluto[9365]: "ovn-5cfcb1-0-in-1" #2: STATE_PARENT_I2: retransmission; will wait 16 seconds for response
May 09 15:00:43 ovn-master pluto[9365]: "ovn-5cfcb1-0-in-1" #2: STATE_PARENT_I2: retransmission; will wait 32 seconds for response
May 09 15:01:11 ovn-master pluto[9365]: "ovn-5cfcb1-0-in-1" #2: deleting other state #2 (STATE_PARENT_I2) and NOT sending notification
May 09 15:01:11 ovn-master pluto[9365]: "ovn-5cfcb1-0-in-1" #1: deleting state (STATE_PARENT_I2) and NOT sending notification
May 09 15:03:44 ovn-master pluto[9365]: "ovn-5cfcb1-0-out-1" #5: deleting state (STATE_V2_CREATE_R) and NOT sending notification
May 09 15:03:44 ovn-master pluto[9365]: "ovn-5cfcb1-0-out-1" #5: ERROR: netlink response for Del SA esp.b3e274ff@10.0.128.35 included errno 3: No such process
May 09 15:03:44 ovn-master pluto[9365]: "ovn-5cfcb1-0-out-1" #5: ERROR: netlink response for Del SA esp.0@10.0.128.15 included errno 3: No such process

This line may show some problem

responding to CREATE_CHILD_SA message (ID 2) from 10.0.128.35:500 with encrypted notification TS_UNACCEPTABLE

For the PKI and CA part, I use ovs-pki req+sign -u <chassis_id> switch on a pki server and move the privkey.pem, cert.pem and cacert.pem from switchca folder to each host

oilbeater commented 5 years ago

another has a similar error

dropping unexpected CREATE_CHILD_SA message containing TS_UNACCEPTABLE notification; message payloads: SK; encrypted payloads: N; missing payloads: SA,Ni,TSi,TSr
oilbeater commented 5 years ago

After upgrade libreSwan to 3.27, the above error log disappeared. But, still no luck, the network still cannot access each other across nodes

May 09 17:09:18 ovn-master pluto[28894]: loading secrets from "/etc/ipsec.secrets"
May 09 17:09:18 ovn-master pluto[28894]: added connection description "ovn-5cfcb1-0-in-1"
May 09 17:09:18 ovn-master pluto[28894]: packet from 10.0.128.35:500: constructed local IKE proposals for ovn-5cfcb1-0-in-1 (IKE SA responder matching remote proposals): 1:IKE:ENCR=AES_GCM_C_256;PRF=HMAC_SHA2_256;INTEG=NONE;DH=MODP2048
May 09 17:09:18 ovn-master pluto[28894]: packet from 10.0.128.35:500: proposal 1:IKE:ENCR=AES_GCM_C_256;PRF=HMAC_SHA2_256;DH=MODP2048 chosen from remote proposals 1:IKE:ENCR=AES_GCM_C_256;PRF=HMAC_SHA2_256;DH=MODP2048[first-match]
May 09 17:09:18 ovn-master pluto[28894]: "ovn-5cfcb1-0-in-1" #1: STATE_PARENT_R1: received v2I1, sent v2R1 {auth=IKEv2 cipher=AES_GCM_16_256 integ=n/a prf=HMAC_SHA2_256 group=MODP2048}
May 09 17:09:18 ovn-master pluto[28894]: "ovn-5cfcb1-0-in-1" #2: initiating v2 parent SA
May 09 17:09:18 ovn-master pluto[28894]: "ovn-5cfcb1-0-in-1" #2: STATE_PARENT_I1: sent v2I1, expected v2R1
May 09 17:09:18 ovn-master pluto[28894]: "ovn-5cfcb1-0-in-1" #1: certificate verified OK: CN=5cfcb1d7-9a4c-4307-a333-5604a4c079d2,OU=Open vSwitch certifier,O=Open vSwitch,ST=CA,C=US
May 09 17:09:18 ovn-master pluto[28894]: "ovn-5cfcb1-0-in-1" #1: IKEv2 mode peer ID is ID_FQDN: '@5cfcb1d7-9a4c-4307-a333-5604a4c079d2'
May 09 17:09:18 ovn-master pluto[28894]: "ovn-5cfcb1-0-in-1" #1: Authenticated using RSA
May 09 17:09:18 ovn-master pluto[28894]: "ovn-5cfcb1-0-in-1" #1: responding to AUTH message (ID 1) from 10.0.128.35:500 with encrypted notification TS_UNACCEPTABLE
May 09 17:09:18 ovn-master pluto[28894]: "ovn-5cfcb1-0-in-1" #2: constructed local ESP/AH proposals for ovn-5cfcb1-0-in-1 (IKE SA initiator emitting ESP/AH proposals): 1:ESP:ENCR=AES_GCM_C_256;INTEG=NONE;DH=NONE;ESN=DISABLED
May 09 17:09:18 ovn-master pluto[28894]: "ovn-5cfcb1-0-in-1" #3: STATE_PARENT_I2: sent v2I2, expected v2R2 {auth=IKEv2 cipher=AES_GCM_16_256 integ=n/a prf=HMAC_SHA2_256 group=MODP2048}
May 09 17:09:18 ovn-master pluto[28894]: added connection description "ovn-5cfcb1-0-out-1"
May 09 17:09:18 ovn-master pluto[28894]: "ovn-5cfcb1-0-in-1" #3: certificate verified OK: CN=5cfcb1d7-9a4c-4307-a333-5604a4c079d2,OU=Open vSwitch certifier,O=Open vSwitch,ST=CA,C=US
May 09 17:09:18 ovn-master pluto[28894]: "ovn-5cfcb1-0-in-1" #3: IKEv2 mode peer ID is ID_FQDN: '@5cfcb1d7-9a4c-4307-a333-5604a4c079d2'
May 09 17:09:18 ovn-master pluto[28894]: "ovn-5cfcb1-0-in-1" #3: Authenticated using RSA
May 09 17:09:18 ovn-master pluto[28894]: "ovn-5cfcb1-0-in-1" #3: negotiated connection [10.0.128.15-10.0.128.15:6081-6081 17] -> [10.0.128.35-10.0.128.35:0-65535 17]
May 09 17:09:18 ovn-master pluto[28894]: "ovn-5cfcb1-0-in-1" #3: STATE_V2_IPSEC_I: IPsec SA established transport mode {ESP=>0x7f9bddc9 <0x5bfc5ad1 xfrm=AES_GCM_16_256-NONE NATOA=none NATD=none DPD=passive}
May 09 17:09:18 ovn-master pluto[28894]: "ovn-5cfcb1-0-out-1" #4: constructed local ESP/AH proposals for ovn-5cfcb1-0-out-1 (ESP/AH initiator emitting proposals): 1:ESP:ENCR=AES_GCM_C_256;INTEG=NONE;DH=MODP2048;ESN=DISABLED
May 09 17:09:18 ovn-master pluto[28894]: "ovn-5cfcb1-0-in-1" #2: constructed local ESP/AH proposals for ovn-5cfcb1-0-in-1 (ESP/AH responder matching remote proposals): 1:ESP:ENCR=AES_GCM_C_256;INTEG=NONE;DH=MODP2048;ESN=DISABLED
May 09 17:09:18 ovn-master pluto[28894]: "ovn-5cfcb1-0-in-1" #2: proposal 1:ESP:SPI=8a48d7a4;ENCR=AES_GCM_C_256;DH=MODP2048;ESN=DISABLED chosen from remote proposals 1:ESP:ENCR=AES_GCM_C_256;DH=MODP2048;ESN=DISABLED[first-match]
May 09 17:09:18 ovn-master pluto[28894]: "ovn-5cfcb1-0-out-1" #4: STATE_V2_CREATE_I: sent IPsec Child req wait response
May 09 17:09:18 ovn-master pluto[28894]: "ovn-5cfcb1-0-in-1" #5: negotiated new IPsec SA [10.0.128.15-10.0.128.15:6081-6081 17] -> [10.0.128.35-10.0.128.35:0-65535 17]
May 09 17:09:18 ovn-master pluto[28894]: "ovn-5cfcb1-0-in-1" #5: negotiated connection [10.0.128.15-10.0.128.15:6081-6081 17] -> [10.0.128.35-10.0.128.35:0-65535 17]
May 09 17:09:18 ovn-master pluto[28894]: "ovn-5cfcb1-0-in-1" #5: STATE_V2_IPSEC_R: IPsec SA established transport mode {ESP=>0x8a48d7a4 <0xa2cac3f7 xfrm=AES_GCM_16_256-NONE-MODP2048 NATOA=none NATD=none DPD=passive}
May 09 17:09:18 ovn-master pluto[28894]: "ovn-5cfcb1-0-out-1" #4: negotiated connection [10.0.128.15-10.0.128.15:0-65535 17] -> [10.0.128.35-10.0.128.35:6081-6081 17]
May 09 17:09:18 ovn-master pluto[28894]: "ovn-5cfcb1-0-out-1" #4: STATE_V2_IPSEC_I: IPsec SA established transport mode {ESP=>0x5c8af574 <0x6bdcfb6f xfrm=AES_GCM_16_256-NONE-MODP2048 NATOA=none NATD=none DPD=passive}
ansisatteka commented 5 years ago

IIRC the TS_UNACCEPTABLE meant that libreswan did not agree about what kind of traffic you want to encrypt (ie geneve in your case).

  1. Did the output in ovs-appctl -t ovs-monitor-ipsec tunnels/show command changed once you upgraded libreswan and saw that TS_UNACCEPTABLE error went away?
  2. how did you conclude that network is still not working? Can you use netcat in UDP mode to emulate geneve traffic and then check in tcpdump if you see those packets encrypted with ESP protocol?
oilbeater commented 5 years ago
  1. After upgraded libreswan the TS_UNACCEPTABLE error went away. And the ovs-appctl -t ovs-monitor-ipsec tunnels/show output will not add more polices after the ovs-appctl -t ovs-monitor-ipsec refresh command

    Kernel policies installed:
    src 10.0.128.15/32 dst 10.0.128.35/32 proto udp dport 6081
    src 10.0.128.15/32 dst 10.0.128.35/32 proto udp dport 6081
    src 10.0.128.15/32 dst 10.0.128.35/32 proto udp sport 6081
    src 10.0.128.15/32 dst 10.0.128.35/32 proto udp sport 6081
    Kernel security associations installed:
    sel src 10.0.128.35/32 dst 10.0.128.15/32 proto udp sport 6081
    sel src 10.0.128.15/32 dst 10.0.128.35/32 proto udp dport 6081
    sel src 10.0.128.35/32 dst 10.0.128.15/32 proto udp dport 6081
    sel src 10.0.128.15/32 dst 10.0.128.35/32 proto udp sport 6081
    IPsec connections that are active:
  2. I create two network namespace on two node, move ovs internal interface into it and try to ping each other. When ovn-nbctl set nb_global . ipsec=false the ping works fine, when ovn-nbctl set nb_global . ipsec=true the ping never return

markdgray commented 4 years ago

Hi @oilbeater , did you resolve this issue?

oilbeater commented 4 years ago

@markdgray No, I think it might be some old kernel issues, but I didn't look into it further.