systemd / systemd

The systemd System and Service Manager
https://systemd.io
GNU General Public License v2.0
12.89k stars 3.7k forks source link

systemd-networkd creates bridges with no-carrier #9252

Open encbladexp opened 6 years ago

encbladexp commented 6 years ago

systemd version the issue has been seen with

238

Used distribution

Arch Linux

Expected behaviour you didn't see

systemd-networkd creates a fully functional bridge

Unexpected behaviour you saw

systemd-networkd creates a bridge with no-carrier

Steps to reproduce the problem

10-vbr0.netdev:

[NetDev]
Name=vbr0
Kind=bridge

10-vbr0.network

[Match]
Name=vbr0

[Network]
Address=192.168.3.1/24

If i delete vbr0 with ip link delete vbr0 and create it with brctl addbr vbr0 systemd-networkd configures the interface with settings from the .network file.

Her the output of ip link show:

6: vbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether ce:98:30:fe:38:4b brd ff:ff:ff:ff:ff:ff

Any ideas?

poettering commented 6 years ago

/cc @ssahani

yuwata commented 6 years ago

If some network set Bridge=vbr0, then the link goes to 'configured' state. That is, e.g.,

# /etc/systemd/network/10-eth0.network
[Match]
Name=eth0

[Network]
Bridge=vbr0

At least currently, it seems that if no network specifies vbr0 in Bridge= then the bridge will never be in configured state...

encbladexp commented 6 years ago

So i need to configure an unused, not required, network interface. I don't think this is the way it should work ;)

(Bridges created with brctl don't have this beavior.)

yuwata commented 6 years ago

My above comment is just a 'workaround'. I could not find why the behaviors between brctl and networkd are different...

ssahani commented 6 years ago

I am trying to figure this out till netdev creation everything is same but will make this up via .network we have a difference.

MrSorcus commented 5 years ago

Up. Can it be resolved with https://github.com/systemd/systemd/pull/9956 ?

encbladexp commented 5 years ago

@MrSorcus i don't think so.

Rapsey commented 5 years ago

I am also having this problem (version 241). The networkctl state stays at no-carrier (configuring)even though ConfigureWithoutCarrier=yes was used in the .network file. Is there a known workaround?

yuwata commented 5 years ago

@Rapsey Please try with disabling ipv6 link local address, that is, LinkLocalAddressing=no. At least, with current git master, it works fine with the setting.

yuwata commented 5 years ago

@Rapsey Or, please try PR #12794 if possible. Thank you.

Rapsey commented 5 years ago

Thank you for the quick reply and fix @yuwata ! I have tested your PR on Debian 9.9 and now the link goes into a no-carrier (configured) state.

Curiously now I can't even "fix" the bridge by recreating it manually. Before this if I recreated the bridge using ip link del and brctl addbr as suggested in the original post, the bridge would be created and configured but without the NO-CARRIER flag. Now all bridges keep getting NO-CARRIER even when created that way.

EDIT: After downgrading back to systemd 241 the bridges still go into NO-CARRIER even when created manually. So I suspect this was not caused by systemd directly but by updating some dependencies necessary to build from source.

yuwata commented 5 years ago

BTW, I cannot reproduce the original issue, no-carrier state, anymore with kernel-5.1.8-200.fc29.x86_64 and current systemd git master. Can I close this issue? @ssahani WDYT?

yuwata commented 5 years ago

The original issue happens even the bridge interface is not managed by networkd.

$ sudo ip link add bridge99 type bridge
$ sudo ip link set bridge99 up

Then, the link will be in NO-CARRIER state.

Rapsey commented 5 years ago

It seems to depend on something else. The following was done on a VM with the exact same OS & kernel: Debian 9.9 with kernel 4.9.0-9-amd64 (4.9.168-1+deb9u2). No NO-CARRIER there.

root@anansi:~# ip link add bridge99 type bridge
root@anansi:~# ip link set bridge99 up
root@anansi:~# ip link show bridge99
3: bridge99: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether be:42:10:4a:d7:7a brd ff:ff:ff:ff:ff:ff
keszybz commented 5 years ago

Logically, I'd expect a bridge with no interfaces to be in NO-CARRIER state. What does it even mean that the bridge has "carrier" when it has no interfaces?

But anyway, with systemd-242-895-g9e93200+ and kernel-5.0.7-300.fc30.x86_64, I do get "carrier" with both manual ip link add and systemd-networkd. We need to figure out what is causing those divergent results before we can fix this.

Rapsey commented 5 years ago

I'm afraid I won't be able to continue investigating this, but here's what I found. Maybe it can help someone else in the future.

On a clean Debian stretch system bridges created with ip link did not get the NO-CARRIER state. I then installed the dependencies necessary to build systemd from source. To get the right versions I had to pull 2 packages from testing: util-linux and libmount-dev.

This resulted in the following packages being installed from testing:

libblkid-dev 2.33.1-0.1
libblkid1 2.33.1-0.1
libc-bin 2.28-10
libc-dev-bin 2.28-10
libc-l10n 2.28-10
libc6 2.28-10
libc6-dev 2.28-10
libc6-dev-i386 2.28-10
libc6-dev-x32 2.28-10
libc6-i386 2.28-10
libc6-x32 2.28-10
libcap-ng0 0.7.9-2
libfdisk1 2.33.1-0.1
libncursesw6 6.1+20181013-2
libsmartcols1 2.33.1-0.1
libtinfo6 6.1+20181013-2
libuuid1 2.33.1-0.1
locales 2.28-10
uuid-dev 2.33.1-0.1

After this, all interfaceless bridges created with ip link got the NO-CARRIER state.

encbladexp commented 5 years ago

@keszybz sometime you need to create a bridge without interfaces, for example to bind virtual machines or dynamic interfaces to it afterwards.

yuwata commented 5 years ago

Anyway, this is not caused by networkd. You can easily confirm this 'issue' even if networkd is stopped. And @keszybz's comment below makes sense for me.

Logically, I'd expect a bridge with no interfaces to be in NO-CARRIER state. What does it even mean that the bridge has "carrier" when it has no interfaces?

I'd like to close this. @keszybz and @ssahani WDYT?

ssahani commented 5 years ago

I agree with @keszybz.

keszybz commented 5 years ago

@keszybz sometime you need to create a bridge without interfaces, for example to bind virtual machines or dynamic interfaces to it afterwards.

Yes, I know this. I don't have have any issue with a bridge device without enslaved interfaces. "no carrier" means that the interface is there, we may even configure addresses and routes on it, but if we send packets, they won't reach anyone, and we will not receive any packets also. And a bridge without interfaces is exactly like that.

I tried to follow the carrier logic in the kernel, but I couldn't figure it out. I see br_port_carrier_check() and br_device_event(), which seem to take care of changes where devices are added, but I don't see what determines the state before any devices are added. Pointers would be very welcome.

keszybz commented 5 years ago

I'd prefer to keep this open until we figure out what is going on here.

ChetHosey commented 5 years ago

I've noticed that a bridge will initially be in the NO-CARRIER state if spanning tree is enabled, until the real interface has completed the listening and learning phases.

If nmcli -g bridge.stp con show br0 shows "yes" then STP is enabled.

From dmesg:

[   13.426970] bnx2 0000:01:00.0 eno1: NIC Copper Link is Up, 1000 Mbps full duplex
[   13.426976] , receive & transmit flow control ON
[   13.427063] IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready
[   13.430757] br0: port 1(eno1) entered blocking state
[   13.430759] br0: port 1(eno1) entered disabled state
[   13.430809] device eno1 entered promiscuous mode
[   13.430848] br0: port 1(eno1) entered blocking state
[   13.430849] br0: port 1(eno1) entered listening state
[   13.506888] bnx2 0000:01:00.1 eno2: NIC Copper Link is Up, 1000 Mbps full duplex
[   13.506893] , receive & transmit flow control ON
[   13.506999] IPv6: ADDRCONF(NETDEV_CHANGE): eno2: link becomes ready
[   28.628042] br0: port 1(eno1) entered learning state
[   43.732041] br0: port 1(eno1) entered forwarding state
[   43.732049] br0: topology change detected, propagating

Before br0: topology change:

9: br0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether 78:2b:cb:13:6c:f4 brd ff:ff:ff:ff:ff:ff

And after:

9: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 78:2b:cb:13:6c:f4 brd ff:ff:ff:ff:ff:ff
jiridanek commented 3 years ago

The preceeding comment from @ChetHosey helped me. To get connectivity back, I simply disabled STP and waited a bit.

ganguin commented 3 years ago

AFAIK a bridge should never be empty, it is not fully setup if empty. In cases interfaces get dynamically added/removed to a bridge, there should be at least one remaining. You can do this with a dummy interface always belonging to the bridge (libvirt does this, virbr0-nic dummy interface).

(Would it maybe be useful to have an netdev option "DummyDevice=..." in the bridge section that automatically does that??)

A bridge inherits the MAC address of its first device (or used to inherit? I was not able to reproduce it), when all devices are removed and a new one is added, it would change its MAC. This can lead to issues.

So I don't know if this is a legacy problem that has changed with recent kernels or if there are still good reasons to not have an empty bridge.

keszybz commented 3 years ago

@ganguin, see https://github.com/systemd/systemd/issues/9252#issuecomment-502557216 re bridges with no interfaces.

Bridge MAC is generated from the bridge name, and not from any devices that are attached to it. We changed that in systemd 241. See https://www.freedesktop.org/software/systemd/man/systemd.net-naming-scheme.html#v241.

LaKing commented 3 years ago

I think I encountered this issue ...

My bridges are in NO CARRIER state.

[root@pm network]# networkctl
IDX LINK        TYPE     OPERATIONAL SETUP      
  1 lo          loopback carrier     unmanaged  
  2 enp3s0f0    ether    routable    configured 
  3 enp3s0f1    ether    off         unmanaged  
  4 enp4s0f0    ether    off         unmanaged  
  5 enp4s0f1    ether    off         unmanaged  
  6 10.20.0.x   bridge   no-carrier  configuring
  7 10.20.25.x  bridge   no-carrier  configuring
  8 tun-hostnet none     routable    unmanaged  
  9 20-0-13     ether    degraded    unmanaged  
 10 20-0-14     ether    degraded    unmanaged  
 11 20-0-12     ether    degraded    unmanaged  
 12 20-25-2     ether    degraded    unmanaged  
 13 20-0-15     ether    degraded    unmanaged 

Added ConfigureWithoutCarrier=yes for now .. but it used to work without that option. I attach only nspawn containers ...

This issue also caused systemd-networkd-wait-online to timeout at boot.

jaskij commented 2 years ago

Completely different setup, but the same symptoms. Industrial device, bridging two ports in DSA, using KSZ8563.

I was able to track this down to a single udev file, 99-default.link:

root@host:/lib/systemd/network# cat 99-default.link.bak 
#  SPDX-License-Identifier: LGPL-2.1+
#
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

[Match]
OriginalName=*

[Link]
NamePolicy=keep kernel database onboard slot path
MACAddressPolicy=persistent

Linux 5.10 systemd version 244, commit: 3ceaa81c61b654ebf562464d142675bd4d57d7b6, Yocto Dunfell, custom distro Patches applied are listed here: http://cgit.openembedded.org/openembedded-core/tree/meta/recipes-core/systemd/systemd_244.5.bb?h=dunfell#n17 Their content can be found here: http://cgit.openembedded.org/openembedded-core/tree/meta/recipes-core/systemd/systemd?h=dunfell

jaskij commented 2 years ago

After further debugging, it's specifically the MACAddressPolicy=persistent line that causes the issue for me.

Similarly, adding Type=!bridge in [Match] made it work. The only issue is, I do not have a persistent MAC address for my device.

jelmd commented 5 months ago

FWIW: After upgrading from Ubuntu 20.04 to 22.04 same problem occurs. Fixed it with /etc/systemd/network/10-bridges.link:

[Match]
Type=bridge

[Link]
MACAddressPolicy=none
dkebler commented 3 months ago

Since @jelmd provided the fix I found for this issue I am commenting here. Yea it works! In my use case I was using link files to rename interfaces to wan and lan using MAC and maybe that's why this bridge link file is required. I made sure to put it lexically before the other interface link files