Open spoignant-proton opened 3 weeks ago
Hi @spoignant-proton
Unfortunately I can't seem to reproduce this issue. I took your config examples and created a self-contained containerlab topology:
name: v6lla
topology:
nodes:
leaf1:
kind: nokia_srlinux
image: ghcr.io/nokia/srlinux:24.7.2
startup-config: |
interface ethernet-1/51 {
description "To SPINE:e1-1"
admin-state enable
subinterface 0 {
type routed
ipv6 {
admin-state enable
router-advertisement {
router-role {
admin-state enable
max-advertisement-interval 120
min-advertisement-interval 30
}
}
}
}
}
network-instance default {
admin-state enable
interface ethernet-1/51.0 {
}
protocols {
bgp {
autonomous-system 65001
router-id 100.65.32.4
dynamic-neighbors {
interface ethernet-1/51.0 {
peer-group underlay_fabric
allowed-peer-as [
65004
]
}
}
afi-safi ipv4-unicast {
admin-state enable
multipath {
allow-multiple-as true
maximum-paths 8
}
}
afi-safi ipv6-unicast {
admin-state disable
}
group underlay_fabric {
admin-state enable
}
}
}
}
spine1:
kind: nokia_srlinux
image: ghcr.io/nokia/srlinux:24.7.2
startup-config: |
interface ethernet-1/1 {
description "To Leaf:e1-51"
admin-state enable
subinterface 0 {
type routed
ipv6 {
admin-state enable
router-advertisement {
router-role {
admin-state enable
max-advertisement-interval 120
min-advertisement-interval 30
}
}
}
}
}
network-instance default {
admin-state enable
interface ethernet-1/1.0 {
}
protocols {
bgp {
admin-state enable
autonomous-system 65004
router-id 100.65.32.1
dynamic-neighbors {
interface ethernet-1/1.0 {
peer-group underlay_fabric
allowed-peer-as [
65001
]
}
}
afi-safi ipv4-unicast {
admin-state enable
multipath {
allow-multiple-as true
maximum-paths 8
}
}
afi-safi ipv6-unicast {
admin-state disable
}
group underlay_fabric {
admin-state enable
failure-detection {
enable-bfd true
fast-failover true
}
}
}
}
}
links:
- endpoints: [leaf1:e1-51, spine1:e1-1]
This topology embeds the configs, so that you should have the nodes boot with the corresponding config bits applied. Once I deployed this topo I see the bgp established just fine:
--{ running }--[ ]--
A:leaf1# show network-instance default protocols bgp neighbor *
-------------------------------------------------------------------------------------------------------------------------------------------------
BGP neighbor summary for network-instance "default"
Flags: S static, D dynamic, L discovered by LLDP, B BFD enabled, - disabled, * slow
-------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------
+----------------+-----------------------+----------------+------+---------+-------------+-------------+------------+-----------------------+
| Net-Inst | Peer | Group | Flag | Peer-AS | State | Uptime | AFI/SAFI | [Rx/Active/Tx] |
| | | | s | | | | | |
+================+=======================+================+======+=========+=============+=============+============+=======================+
| default | fe80::187d:1ff:feff:1 | underlay_fabri | D | 65004 | established | 0d:0h:0m:19 | ipv4- | [0/0/0] |
| | %ethernet-1/51.0 | c | | | | s | unicast | |
+----------------+-----------------------+----------------+------+---------+-------------+-------------+------------+-----------------------+
-------------------------------------------------------------------------------------------------------------------------------------------------
Summary:
0 configured neighbors, 0 configured sessions are established, 0 disabled peers
1 dynamic peers
fwiw, my v6 LLA are different in the two net nses:
--{ running }--[ ]--
A:spine1# bash
admin@spine1:~$ ip netns exec srbase ip a ls dev e1-1
10493: e1-1@if10494: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9232 qdisc noqueue state UP group default
link/ether 1a:05:01:ff:00:01 brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet6 fe80::a8c1:abff:fece:430c/64 scope link
valid_lft forever preferred_lft forever
admin@spine1:~$ ip netns exec srbase-default ip a ls dev e1-1.0
4: e1-1.0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 1a:05:01:ff:00:01 brd ff:ff:ff:ff:ff:ff link-netns srbase
inet6 fe80::1805:1ff:feff:1/64 scope link
valid_lft forever preferred_lft forever
Hi @hellt and thanks for your reply,
That's interesting, because in your case you also have the same MAC address but the LLA are different, with the one on the srbase side derived from a different MAC address. It is my understanding that i'm experiencing the issue because the LLA are the same.
I tried to load a new topology from the file you shared on the same host system, and i observe the same than you:
admin@spine1:~$ ip netns exec srbase ip a ls dev e1-1
10779: e1-1@if10780: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9232 qdisc noqueue state UP group default
link/ether 1a:b8:01:ff:00:01 brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet6 fe80::a8c1:abff:fe14:1ab8/64 scope link
valid_lft forever preferred_lft forever
admin@spine1:~$ ip netns exec srbase-default ip a ls dev e1-1.0
7: e1-1.0@if10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 1a:b8:01:ff:00:01 brd ff:ff:ff:ff:ff:ff link-netns srbase
inet6 fe80::18b8:1ff:feff:1/64 scope link
valid_lft forever preferred_lft forever
So it looks like something in my topology is preventing the LLA from being derived from that different MAC address, but it is not clear why and how i can avoid it. A key difference is that i'm bootstraping the containers with their default config, and load the actual config to be tested only later. This is because my config files are too big to be loaded using the startup-config
node attribute.
I'll make further tests to try to narrow this further down.
A key difference is that i'm bootstraping the containers with their default config, and load the actual config to be tested only later. This is because my config files are too big to be loaded using the
startup-config
node attribute.
can you share more details about it? I can see how you might now want to include it in the topology file due to the size, but you should be able to use startup-config: cfg1.txt
where cfg1.txt contains the add-on config you want to test, and it should work regarless the size.
i think we will pause this because we are under heavy time pressures to complete additional testing for a new deployment imminently.
if you dont mind leaving this open a few weeks, we can come back to the deep dive, for now, we have a workaround that works well enough
So the way the configuration is loaded (either directly using startup-config
, or by our own means after the container is created to overcome size limit) is unrelated with this issue. As it turned out, the issue is triggered by having device creation ordering rules, for example if we want to start leaves only after spines have been created and configured.
The following is a reproducing topology:
name: v6lla
topology:
nodes:
leaf1:
kind: nokia_srlinux
image: ghcr.io/nokia/srlinux:24.7.2
stages:
create:
wait-for:
- node: spine1
stage: configure
startup-config: leaf1.txt
spine1:
kind: nokia_srlinux
image: ghcr.io/nokia/srlinux:24.7.2
startup-config: spine1.txt
links:
- endpoints: [leaf1:e1-51, spine1:e1-1]
Results:
A:spine1# show network-instance default protocols bgp neighbor
---------------------------------------------------------------------------------------------
BGP neighbor summary for network-instance "default"
Flags: S static, D dynamic, L discovered by LLDP, B BFD enabled, - disabled, * slow
---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------
+---------+---------+---------+---------+---------+---------+---------+---------+---------+
| Net- | Peer | Group | Flags | Peer-AS | State | Uptime | AFI/SAF | [Rx/Act |
| Inst | | | | | | | I | ive/Tx] |
+=========+=========+=========+=========+=========+=========+=========+=========+=========+
| default | fe80::1 | underla | D | | active | - | | |
| | 84f:ff: | y_fabri | | | | | | |
| | feff:33 | c | | | | | | |
| | %ethern | | | | | | | |
| | et- | | | | | | | |
| | 1/1.0 | | | | | | | |
+---------+---------+---------+---------+---------+---------+---------+---------+---------+
---------------------------------------------------------------------------------------------
Summary:
0 configured neighbors, 0 configured sessions are established, 0 disabled peers
1 dynamic peers
admin@spine1:~$ ip netns exec srbase ip a ls dev e1-1
10874: e1-1@if10873: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9232 qdisc noqueue state UP group default
link/ether 1a:53:01:ff:00:01 brd ff:ff:ff:ff:ff:ff link-netnsid 4
inet6 fe80::1853:1ff:feff:1/64 scope link
valid_lft forever preferred_lft forever
admin@spine1:~$ ip netns exec srbase-default ip a ls dev e1-1.0
7: e1-1.0@if10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 1a:53:01:ff:00:01 brd ff:ff:ff:ff:ff:ff link-netns srbase
inet6 fe80::1853:1ff:feff:1/64 scope link
valid_lft forever preferred_lft forever
The issue no longer happens if the create
section is removed from the leaf1
node definition.
FYI i'm using the latest clab version:
version: 0.59.0
commit: 9e964727
date: 2024-10-23T02:44:27Z
source: https://github.com/srl-labs/containerlab
rel. notes: https://containerlab.dev/rn/0.59/
I can imagine how doing the delayed start between the nodes that share the same link might cause some issues.
Do you need this stage after all? Can you live without it? Stages work best when you either delay creation of nodes that do not connect one with another (like islands of your topology) or if they do not share the same link segment, like connecting over a bridge.
Its to manage the delivery of the topology inside a nominal host. our full topology is like 50 nodes and even with a decent sized host, it gets very sad during the initial standup because the cpus are murdered with all the provisioning.
stages allow to break up the startup hit, when the running topology sits happily inside the hardware we have on hand
Stages were also meant to have certain parts of the network start before others, e.g. have a ready core layer before starting fabrics. I've tried removing those and replacing with --max-workers N
to reduce the cpu stress when starting the topology. It is not obvious to me why it should not be used in the first place. Eventually if a device must start at the same time than all other devices that it is connected to, inside our relatively complex topology that amount to say all device must start at the same time, and by reducing concurrency we may hit the same issue again.
As expected, after removing the stages
section from all srl nodes, with --max-workers 4
, i'm still hitting the issue:
admin@SPINE01:~$ ip netns exec srbase ip a ls dev e1-1
38: e1-1@if37: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8986 qdisc noqueue state UP group default
link/ether 1a:83:1f:ff:00:01 brd ff:ff:ff:ff:ff:ff link-netnsid 5
inet6 fe80::1883:1fff:feff:1/64 scope link
valid_lft forever preferred_lft forever
admin@SPINE01:~$ ip netns exec srbase-default ip a ls dev e1-1.0
4: e1-1.0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8968 qdisc noqueue state UP group default qlen 1000
link/ether 1a:83:1f:ff:00:01 brd ff:ff:ff:ff:ff:ff link-netns srbase
inet6 fe80::1883:1fff:feff:1/64 scope link
valid_lft forever preferred_lft forever
Well even without stages
and --max-workers
the problem still affects a few interfaces randomly for a moderately sized topology. For example, one uplink on that leaf is affected, the other ones are not:
admin@LEAF01:~$ ip netns exec srbase ip a ls dev e1-51
756: e1-51@if755: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8986 qdisc noqueue state UP group default
link/ether 1a:03:07:ff:00:33 brd ff:ff:ff:ff:ff:ff link-netnsid 12
inet6 fe80::1803:7ff:feff:33/64 scope link
valid_lft forever preferred_lft forever
admin@LEAF01:~$ ip netns exec srbase-default ip a ls dev e1-51.0
4: e1-51.0@if14: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8968 qdisc noqueue state UP group default qlen 1000
link/ether 1a:03:07:ff:00:33 brd ff:ff:ff:ff:ff:ff link-netns srbase
inet6 fe80::1803:7ff:feff:33/64 scope link
valid_lft forever preferred_lft forever
admin@LEAF01:~$ ip netns exec srbase ip a ls dev e1-52
760: e1-52@if759: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8986 qdisc noqueue state UP group default
link/ether 1a:03:07:ff:00:34 brd ff:ff:ff:ff:ff:ff link-netnsid 6
inet6 fe80::a8c1:abff:fe1c:cc45/64 scope link
valid_lft forever preferred_lft forever
admin@LEAF01:~$ ip netns exec srbase-default ip a ls dev e1-52.0
5: e1-52.0@if15: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8968 qdisc noqueue state UP group default qlen 1000
link/ether 1a:03:07:ff:00:34 brd ff:ff:ff:ff:ff:ff link-netns srbase
inet6 fe80::1803:7ff:feff:34/64 scope link
valid_lft forever preferred_lft forever
So this all looks like a race condition when creating containers and wiring up. Do we ever need to have an LLA on the parent (srbase) interface? If not maybe this can be avoided using the workaround described in my first post.
I've hit this issue while trying to test an unnumbered bgp session (ipv6 link local addresses) between two srl containers running 24.7.2 or 23.10. It can be reproduced with the following minimal setup comprised of a leaf (srl/ixrd2l) and a spine (srl/ixrd3l). Port e1-51 of the leaf is connected to port e1-1 of the spine.
Configuration as in the attached files: spine.txt leaf.txt
The BGP session fails to establish, packet capture between the two containers shows that the ICMPv6 RA are correctly exchanged and one side attempt to establish the session. However, the other side replies with both SYN/ACK and RST before any OPEN has been sent which ends up the session prematurely.
Additionally, ping on the peer's link-local address reports duplicates:
After further investigation i believe this is because the parent interface inside the srbase container has a link-local address enabled, that happens to be the same than the child inside srbase-default. For example:
Therefore, the SYN is received inside both containers, srbase rejects it with RST (which causes the issue) and srbase-default accepts it.
To confirm, i ran the following commands, in order to get rid of the link-local address on the parent interface:
After this, the BGP session establishes successfully, and ping no longer shows duplicate packets, which confirm the aforementioned assumptions.
As another workaround, using vlan-tagging on the p2p interfaces prevent the issue from happening.