xelerance / Openswan

Openswan
Other
849 stars 214 forks source link

pfkey_delete_parse hangs on (ips_refcount > 4) when NAT_TRAVERSAL is defined #484

Open dandema opened 4 months ago

dandema commented 4 months ago

Hello community,

Setup

Openswan 2.6.51.3 (built with NAT_TRAVERSAL) on Linux 4.4.60; strongswan 5.7.1 as IKE daemon. KLIPS stack with PF_KEY messaging. IKEv2 connection "gateway-to-gateway".

# ipsec.conf
conn vxlan
        type = tunnel
        auto = route
        keyexchange = ikev2
        ikelifetime = 86400s
        lifetime = 86400s
        ike = aes256-sha2_256-modp2048!
        esp = aes256-sha2_256-modp2048!
        dpdaction = clear
        dpddelay = 30s
        leftupdown = <...>
        leftauth = pubkey
        left = <local-ipsec>
        leftsubnet = <local-gw>/32
        rightupdown = <...>
        rightauth = pubkey
        right = <remote-ipsec>
        rightsubnet = <remote-gw>/32
        leftid = <...>
        rightid = <...>
        leftcert = <...>
# strongswan.conf
charon {
        port = 500
        port_nat_t = 4500
        i_dont_care_about_security_and_use_aggressive_mode_psk = yes
        cisco_unity = yes
        make_before_break = yes
        plugins {
                socket-default {
                        listen4 = <ipsec-local>
                        use_ipv6 = no
                }
                kernel-netlink {
                        fwmark = !0x80/0x80
                }
                kernel-klips {
                        ipsec_dev_count = 1
                        ipsec_dev_mtu = 1554
                }
                xauth-passwd {
                        auth_groups = ipsecxauth
                }
        }
}

Issue

Strongswan hangs indefinitely after calling kernel_klips_ipsec->del_sa->pfkey_send()

Analysis

    ipsp = ipsec_sa_getbyid(&(extr->ips->ips_said), IPSEC_REFSA);
        ...
    if (atomic_read(&ipsp->ips_refcount) > 4) {
        spin_unlock_bh(&tdb_lock);
        wait_event_interruptible(ipsp->ips_waitq, (atomic_read(&ipsp->ips_refcount) <= 4));

See last line in the attached log.

#ifdef NAT_TRAVERSAL
    if (extr->ips->ips_natt_sport || extr->ips->ips_natt_dport) {
        ...
        nat_t_ips_saved = extr->ips;
        extr->ips = ipsq;
        ### --- No ipsec_sa_put(ipsq) neither here nor later --- ###
    }
    else
#endif
    {
        ...
        /* this will call delchain-equivalent if refcount=>0 */
        ipsec_sa_put(ipsq, IPSEC_REFSA);
    }

See these lines in the attached log: [ 137.642771] klips_debug:pfkey_update_parse: . [ 137.643473] klips_debug:pfkey_update_parse: .

Proposed fix

--- a/linux/net/ipsec/pfkey_v2_parser.c
+++ b/linux/net/ipsec/pfkey_v2_parser.c
@@ -635,6 +635,7 @@ pfkey_update_parse(struct sock *sk, stru
         */

        extr->ips = nat_t_ips_saved;
+       ipsec_sa_put(ipsq, IPSEC_REFSA);

        error = 0;
        KLIPS_PRINT(debug_pfkey,

What do you think about our analysis and the way we're fixing it?

-best regards

Daniele De Matteis

full-log-with-pfkey-debug.txt

letoams commented 4 months ago

KLIPS (and openswan) has been abandoned code since about 2012. The fact that you use it is crazy. It doesn’t support AES_GCM or AESNI instructions or IPv6.PaulSent using a virtual keyboard on a phoneOn Feb 13, 2024, at 06:27, Daniele @.***> wrote: Hello community, Setup Openswan 2.6.51.3 (built with NAT_TRAVERSAL) on Linux 4.4.60; strongswan 5.7.1 as IKE daemon. KLIPS stack with PF_KEY messaging. IKEv2 connection "gateway-to-gateway".

ipsec.conf

conn vxlan type = tunnel auto = route keyexchange = ikev2 ikelifetime = 86400s lifetime = 86400s ike = aes256-sha2_256-modp2048! esp = aes256-sha2_256-modp2048! dpdaction = clear dpddelay = 30s leftupdown = <...> leftauth = pubkey left = leftsubnet = /32 rightupdown = <...> rightauth = pubkey right = rightsubnet = /32 leftid = <...> rightid = <...> leftcert = <...>

strongswan.conf

charon { port = 500 port_nat_t = 4500 i_dont_care_about_security_and_use_aggressive_mode_psk = yes cisco_unity = yes make_before_break = yes plugins { socket-default { listen4 = use_ipv6 = no } kernel-netlink { fwmark = !0x80/0x80 } kernel-klips { ipsec_dev_count = 1 ipsec_dev_mtu = 1554 } xauth-passwd { auth_groups = ipsecxauth } } }

Issue Strongswan hangs indefinitely after calling kernel_klips_ipsec->del_sa->pfkey_send() Analysis

In Openswan, pfkey_delete_parse() stucks here, after ipsec_sa_getbyid() incremented ips_refcount from 4 to 5:

ipsp = ipsec_sa_getbyid(&(extr->ips->ips_said), IPSEC_REFSA);
    ...
if (atomic_read(&ipsp->ips_refcount) > 4) {
    spin_unlock_bh(&tdb_lock);
    wait_event_interruptible(ipsp->ips_waitq, (atomic_read(&ipsp->ips_refcount) <= 4));

See last line in the attached log.

ips_refcount became 4 and didn't go back to 3 at a former call of pfkey_update_parse():

ifdef NAT_TRAVERSAL

if (extr->ips->ips_natt_sport || extr->ips->ips_natt_dport) {
    ...
    nat_t_ips_saved = extr->ips;
    extr->ips = ipsq;
    ### --- No ipsec_sa_put(ipsq) neither here nor later --- ###
}
else

endif

{
    ...
    /* this will call delchain-equivalent if refcount=>0 */
    ipsec_sa_put(ipsq, IPSEC_REFSA);
}

See these lines in the attached log: [ 137.642771] klips_debug:pfkey_update_parse: . [ 137.643473] klips_debug:pfkey_update_parse: . Proposed fix --- a/linux/net/ipsec/pfkey_v2_parser.c +++ b/linux/net/ipsec/pfkey_v2_parser.c @@ -635,6 +635,7 @@ pfkey_update_parse(struct sock sk, stru /

    extr->ips = nat_t_ips_saved;

What do you think about our analysis and the way we're fixing it? -best regards Daniele De Matteis full-log-with-pfkey-debug.txt

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>

dandema commented 4 months ago

Hi letoams,

KLIPS (and openswan) has been abandoned code since about 2012. The fact that you use it is crazy. It doesn’t support AES_GCM or AESNI instructions or IPv6

Thanks for your feedback, but it isn't that crazy. We use KLIPS it because we run a packet accelerator that requires to find a linux device for ipsec. And this is a gw-to-gw tunnel, where we don't need neither ipv6 nor other ciphering algos than the few I listed.

letoams commented 4 months ago

On Tue, 13 Feb 2024, Daniele wrote:

We use KLIPS it because we run a packet accelerator that requires to find a linux device for ipsec.

So use a native XFRMi interface. eg with libreswan that is setting ipsec-device=yes to get an ipsec1 interface. strongswan also supports it with the native XFRM IPsec stack.

If you are using AES-CBC, your packet accelerator won't be useful as AES-CBC is a number of factors slower than AES-GCM. I bet using GCM without accelerator is faster than your current CBC accelerated stuff.

And this is a gw-to-gw tunnel, where we don't need neither ipv6 nor other ciphering algos than the few I listed.

If it breaks you get to keep all pieces :)

dandema commented 4 months ago

So use a native XFRMi interface

Nice to do, and it is in our roadmap, but KLIPS is our current software stack

AES-CBC is a number of factors slower than AES-GCM

Cyphering algos are decided by the ISP in our case