katalix / go-l2tp

Go library for building L2TP applications on Linux systems
MIT License
45 stars 15 forks source link

"Malformed header" error terminates connection #6

Closed nmschulte closed 3 months ago

nmschulte commented 3 months ago

Hi, I am running into #4. My current solution is to use xl2tpd vs go-l2tp/kl2tp.

As noted there, it can be observed in the logs that the length reported matches the tunnel ID:

The issue is definitely the following in the kl2tpd logs:

level=error tunnel_name=t1 function=transport message="frame receive failed" error="malformed header: length 17024 exceeds buffer bounds of 30"

You'll also notice elsewhere that tunnel_id=17024, so it would appear that the malformed header received from the VPN server for the AVP ICCN response has the length field mixed up somehow with the tunnel_id.

level=debug tunnel_name=t1 function=transport message=recv message_type=avpMsgTypeSli
level=error tunnel_name=t1 message="bad control message" message_type=avpMsgTypeSli error="no specification for v2 message avpMsgTypeSli"

Interestingly, xl2tpd debug logs state:

message_type_avp: message type 16 (Set-Link-Info)
ignore_avp : Ignoring AVP

Particularly, this is supposedly some Cisco Meraki gear:

xl2tpd[132243]: vendor_avp: peer reports vendor 'Katalix Systems Ltd. Linux-4.4.177-meraki (x86_64)'

I haven't compared with xl2tpd's implementation yet, but it would be nice if go-l2tpd/kl2tpd supported this equipment/setup. Presumably, it simply needs to ignore the Set-Link-Info AVP message type in this case.


** Message: 06:40:03.504: strongSwan IPsec connection is up.
** Message: 06:40:03.506: xl2tpd started with pid 132243
xl2tpd[132243]: Not looking for kernel SAref support.
xl2tpd[132243]: Using l2tp kernel support.
xl2tpd[132243]: xl2tpd version xl2tpd-1.3.18 started on xxxxxxxx PID:132243
xl2tpd[132243]: Written by Mark Spencer, Copyright (C) 1998, Adtran, Inc.
xl2tpd[132243]: Forked by Scott Balmos and David Stipp, (C) 2001
xl2tpd[132243]: Inherited by Jeff McAdams, (C) 2002
xl2tpd[132243]: Forked again by Xelerance (www.xelerance.com) (C) 2006-2016
xl2tpd[132243]: Listening on IP address 0.0.0.0, port 33177
xl2tpd[132243]: get_call: allocating new tunnel for host xxx.xxx.xxx.xxx, port xxxx.
xl2tpd[132243]: Connecting to host xxx.xxx.xxx.xxx, port xxxxx
xl2tpd[132243]: control_finish: message type is (null)(0).  Tunnel is 0, call is 0.
xl2tpd[132243]: control_finish: sending SCCRQ
xl2tpd[132243]: message_type_avp: message type 2 (Start-Control-Connection-Reply)
xl2tpd[132243]: protocol_version_avp: peer is using version 1, revision 0.
xl2tpd[132243]: framing_caps_avp: supported peer frames: async sync
xl2tpd[132243]: bearer_caps_avp: supported peer bearers: analog digital
xl2tpd[132243]: firmware_rev_avp: peer reports firmware version 264 (0x0108)
xl2tpd[132243]: hostname_avp: peer reports hostname 'xxxxxxxx'
xl2tpd[132243]: vendor_avp: peer reports vendor 'Katalix Systems Ltd. Linux-4.4.177-meraki (x86_64)'
xl2tpd[132243]: assigned_tunnel_avp: using peer's tunnel 37327
xl2tpd[132243]: receive_window_size_avp: peer wants RWS of 10.  Will use flow control.
xl2tpd[132243]: control_finish: message type is Start-Control-Connection-Reply(2).  Tunnel is 37327, call is 0.
xl2tpd[132243]: control_finish: sending SCCCN
xl2tpd[132243]: Connection established to xxx.xxx.xxx.xxx, xxxxx.  Local: 51958, Remote: 37327 (ref=0/0).
xl2tpd[132243]: Calling on tunnel 51958
xl2tpd[132243]: control_finish: message type is (null)(0).  Tunnel is 37327, call is 0.
xl2tpd[132243]: control_finish: sending ICRQ
xl2tpd[132243]: message_type_avp: message type 11 (Incoming-Call-Reply)
xl2tpd[132243]: assigned_call_avp: using peer's call 19668
xl2tpd[132243]: control_finish: message type is Incoming-Call-Reply(11).  Tunnel is 37327, call is 19668.
xl2tpd[132243]: control_finish: Sending ICCN
xl2tpd[132243]: Call established with xxx.xxx.xxx.xxx, Local: 5484, Remote: 19668, Serial: 1 (ref=0/0)
xl2tpd[132243]: start_pppd: I'm running: 
xl2tpd[132243]: "/usr/sbin/pppd" 
xl2tpd[132243]: "plugin" 
xl2tpd[132243]: "pppol2tp.so" 
xl2tpd[132243]: "pppol2tp" 
xl2tpd[132243]: "7" 
xl2tpd[132243]: "passive" 
xl2tpd[132243]: "nodetach" 
xl2tpd[132243]: ":" 
xl2tpd[132243]: "debug" 
xl2tpd[132243]: "file" 
xl2tpd[132243]: "/run/nm-l2tp-57d65599-cccb-4d86-ba83-15914e8cf86f/ppp-options" 
xl2tpd[132243]: message_type_avp: message type 16 (Set-Link-Info)
xl2tpd[132243]: ignore_avp : Ignoring AVP
xl2tpd[132243]: control_finish: message type is Set-Link-Info(16).  Tunnel is 37327, call is 19668.
nmschulte commented 3 months ago

I'm not sure if this is L2TPv3 or not, but it states:

7.1. Malformed AVPs and Control Messages ... A control message with a malformed header MUST be discarded.

5.2. Mandatory AVPs and Setting the M Bit ... If the M bit is not set on an unrecognized AVP, the AVP MUST be ignored when received, processing the control message as if the AVP were not present.

tomparkin commented 3 months ago

Hi, thanks for reporting this @nmschulte !

I read through the Arch forum thread which covers quite a lot of ground...

level=error tunnel_name=t1 function=transport message="frame receive failed" error="malformed header: length 17024 exceeds buffer bounds of 30"

I think this is just informational logging and hence can probably be safely ignored.

Both RFC3931 (L2TPv3) and RFC2661 (L2TPv2) suggest that a malformed header should cause the control connection to be torn down.

However currently go-l2tp doesn't do this -- the logging above is advisory, but unless the message parsing fails due to an unrecognised mandatory AVP it shouldn't cause the connection to fail. You can see the handling here in the transport code:

https://github.com/katalix/go-l2tp/blob/master/l2tp/transport.go#L269

level=error tunnel_name=t1 message="bad control message" message_type=avpMsgTypeSli error="no specification for v2 message avpMsgTypeSli"

I suspect this is the root cause of the issue. Currently go-l2tp doesn't handle the SLI message, which does trigger the tunnel to send CDN and tear down the session:

https://github.com/katalix/go-l2tp/blob/master/l2tp/l2tp_dynamic_tunnel.go#L304

I'll see about adding SLI support which should address this issue.

nmschulte commented 3 months ago

Great, thank you for your time!

tomparkin commented 3 months ago

I've been able to reproduce this issue -- it is indeed caused by go-l2tp not handling SLI.

The "malformed header" logging is interesting, and seems to be caused by a race condition in the LNS -- or possibly a clash of expectations between the LAC and LNS.

On the client (LAC) side, go-l2tp instantiates the dataplane (creates a session instance in the kernel) once its ICCN message has been acked by the server (LNS).

However, if the LNS sends the initial PPP LCP packet /before/ sending the ICCN ack, then on the client side the kernel doesn't know how to handle the PPP LCP frame and hence passes it up to userspace to figure out. This ends up causing the "malformed header" log message because go-l2tp isn't really set up for directly handling data packets.

To an extent the latter isn't a big deal (since the error logging doesn't break anything per-se), however it's a confusing log line which doesn't help when debugging problems, so it's worth fixing too.

I have local fixes for both these issues which I'll get pushed to gh and rolled out in to the Debian packaging when I can.

Thanks again for your report.

nmschulte commented 3 months ago

2024-05-17_13h45m32s_screenshot Thank you!