rwestphal / openbsd-ldpd

LDP implementation for OpenBSD
17 stars 3 forks source link

Problem with packet size in VPLS setup. #2

Open frc-javier opened 8 years ago

frc-javier commented 8 years ago

Hello.

I have created a setup like the one described here, but with all nodes based on OpenBSD 5.9, but I think there is a problem with the MTU definitions. When sending a packet from CE1 to CE2 or CE3, all works well except for packets near the maximum MTU (1500). With the configuration shown here, sending a ping with data size 1474 from CE1 to CE2, PE1 crashes (kernel panic). The message is "panic: kernel diagnostic assetion "ifp != NULL" failed: file "../../../../net/if_mpw.c", line 368. I have tried both OpenBSD 5.9 and OpenBSD 5.8, the only difference being that in 5.8 the assertion failed is in line 372 of the same file if_mpw.c.

It is a problem in the MTU size of the pseudowire interfaces and the underlying interfaces emX. Increasing their MTU solves the problem. Anyway, is there a way for not crashing the system when the MTUs have the default values? Thank you for any help.

rwestphal commented 8 years ago

Hi,

That's a known issue. I'll ask rzalamena@ to commit a fix today since he knows the VPLS kernel code better than I do. Keep an eye on source-changes@openbsd.org as this should be fixed soon.

Cheers.

frc-javier commented 8 years ago

Hello.

Thank you for your reply. I will follow these changes.

Only a sidenote about the proposed VPLS test setup. In addition to changing MTUs of the underlying Ethernet interfaces of PE and P, in the VPLS sample setup it would be useful to disable IP fragment reassembly in PE nodes. If this is not done and the CEs send IP fragments, PEs will reassemble incoming fragments and try to send the resulting IP packet over the pseudowire interfaces. Thus, if you don't change the MTU of the pseudowire interfaces (which should not be different than that of the MTU of the CE interfaces), you will have problems. Anyway, a PE node implementing a layer 2 VPN should not try to reassemble higher layer fragments. To disable that behaviour in OpenBSD, you can include a "set reassemble no" instruction in the pf.conf file of PE nodes, as the default behaviour is to reassemble fragments if pf is enabled.

Thank you.

rzalamena commented 8 years ago

Hello @frc-javier ,

Thank you for your meaningful report, it is indeed a problem which was already known.

The current version of OpenBSD (-current) has this problem fixed, because the code that causes this 'panic' was removed. The panic you are seeing is actually an KASSERT() for an condition that is not true.

Long story

When the code was first wrote I made an assumption that it would not be possible to change interface data while dealing with the packet, which is not true and might have been caused due some heavy changes in the network stack that occurred during the time mpw(4) was being developed.

It is possible that you are hitting this condition when you have a working mpw(4) and you change some interface (mainly vlan(4)s) while having packets being processed. In your case I assume it happens when you change the MTU while having some packets being routed.

Short story

I messed up with a KASSERT() which I though would not be possible, and in 5.8 and 5.9 it actually happens.

The fix

You have to patch the if_mpw.c file to not use the KASSERT() and instead drop the packet.

Replace this line:

KASSERT(ifp != NULL)

With this:

if (ifp == NULL) {
        m_freem(m);
        return (NULL);
}

The worst thing that can happen with this change is that you might lose packets when you change an interface configuration while passing traffic on the affected mpw. It will be a very small amount of packets and it only happens temporarily while changing interface settings related to the mpw.

Also thank you for your note about the IP fragments with PW, we haven't though about that.

rwestphal commented 8 years ago

Javier,

Thanks for your valuable feedback. You are right, on an L2VPN the PE should not reassemble CE fragmented packets under any circumstance. I'll update the wiki page with your suggestion later.

Regarding the MTU issues, I'd like you to be more specific on what you did. Did you change the MTU of the core faced interfaces of the PE routers to accommodate the MPLS labels and the Control Word (and all the P router interfaces too)? I believe that the MTU of the pseudowire interfaces doesn't need to be changed.

frc-javier commented 8 years ago

Hello.

Thank you Rafael & Renato for your replies. I will try the patch for if_mpw.c.

As for the test setup, I have deployed the same scenario as proposed in the VPLS setup, but with all nodes based on OpenBSD 5.8 / 5.9. For testing, I made pings between CEs. Some of these pings were large enough so that the CE sent them fragmented, but none of them resulting in more than two IP fragments.

The MTUs, when IP reassembly is disabled, are configured as follows:

Renato, you are right on the MTU of the pseudowire interfaces at PEs, 1500 is enough. But remember the problem of IP reassembly at PEs. Before I realized that the PEs were reassembling the fragments, I had to increase the MTU of the pseudowire interfaces (and the underlying Ethernets), because with MTU 1500 the PEs would crash. When IP reassembly is disabled at PEs, then the MTU of the pseudowire interfaces can be the right value.

Rafael, I have checked that changing parameters of the pseudowire interfaces while there is traffic flowing through them, may crash the system, as you describe. But the PE also crashes if the configured MTU is not large enough and a packet larger than the configured MTU is sent over it.

Cheers.

rwestphal commented 8 years ago

Thanks a lot Javier, now everything makes sense. Just updated the wiki page with your suggestions (MTU update and editing pf.conf). Used 1530 for the MTU to account for an extra VLAN header.

Will take a look at the kernel later, I need to test the patch from @rzalamena and see if it fixes everything before commiting.