oxidecomputer / omicron

Omicron: Oxide control plane
Mozilla Public License 2.0
252 stars 40 forks source link

trying bgp peer set with peer address already in use but different asn results in existing bgp peer disappearing #6472

Open elaine-oxide opened 3 months ago

elaine-oxide commented 3 months ago

I am running a4x2 with:

Initial state:

$ oxide system networking bgp show-status
switch0
=======
Peer Address  Local ASN  Remote ASN  Session State  State Duration
169.254.30.1  65547      64502       Established    1day 1h 14m 40s 883ms
169.254.10.1  65547      64500       Established    1day 1h 14m 39s 826ms

switch1
=======
Peer Address  Local ASN  Remote ASN  Session State  State Duration
169.254.40.1  65547      64502       Established    1day 41m 25s 496ms
169.254.20.1  65547      64500       Established    1day 59m 17s 875ms

$ oxide system networking switch-port-settings show
switch1/qsfp0
=============
Autoneg  Fec   Speed
false    None  Speed100G

Address          Lot            VLAN
169.254.20.2/30  initial-infra  None
169.254.40.2/30  initial-infra  None

BGP Peer      Config   Export          Import          Communities  Connect Retry  Delay Open  Enforce First AS  Hold Time  Idle Hold Time  Keepalive  Local Pref  Md5 Auth  Min TTL  MED   Remote ASN  VLAN
169.254.20.1  as65547  [no filtering]  [no filtering]  []           3              3           false             6          3               2          None        None      None     None  None        None
169.254.40.1  as65547  [no filtering]  [no filtering]  []           3              3           false             6          3               2          None        None      None     None  None        None

Destination  Nexthop  Vlan  Preference

switch0/qsfp0
=============
Autoneg  Fec   Speed
false    None  Speed100G

Address          Lot            VLAN
169.254.10.2/30  initial-infra  None
169.254.30.2/30  initial-infra  None

BGP Peer      Config   Export          Import          Communities  Connect Retry  Delay Open  Enforce First AS  Hold Time  Idle Hold Time  Keepalive  Local Pref  Md5 Auth  Min TTL  MED   Remote ASN  VLAN
169.254.10.1  as65547  [no filtering]  [no filtering]  []           3              3           false             6          0               2          None        None      None     None  None        None
169.254.30.1  as65547  [no filtering]  [no filtering]  []           3              3           false             6          0               2          None        None      None     None  None        None

Destination  Nexthop  Vlan  Preference

I created a fake peer for address not yet used as a peer address with the ASN already in use by the existing peers.

$ oxide system networking bgp peer set --rack 619c28eb-c688-4823-b846-a6f9ed25be12 --switch switch1 --port qsfp0 --addr 169.254.60.1 --bgp-config as65547

Resulting state after above command (I see 169.254.60.1 has appeared in the output):

$ oxide system networking bgp show-status
switch0
=======
Peer Address  Local ASN  Remote ASN  Session State  State Duration
169.254.30.1  65547      64502       Established    1day 1h 18m 51s 221ms
169.254.10.1  65547      64500       Established    1day 1h 18m 50s 165ms

switch1
=======
Peer Address  Local ASN  Remote ASN  Session State  State Duration
169.254.20.1  65547      64500       Established    1day 1h 3m 28s 194ms
169.254.40.1  65547      64502       Established    1day 45m 35s 814ms
169.254.60.1  65547      0           Connect        6s 699ms

$ oxide system networking switch-port-settings show
switch1/qsfp0
=============
Autoneg  Fec   Speed
false    None  Speed100G

Address          Lot            VLAN
169.254.20.2/30  initial-infra  None
169.254.40.2/30  initial-infra  None

BGP Peer      Config   Export          Import          Communities  Connect Retry  Delay Open  Enforce First AS  Hold Time  Idle Hold Time  Keepalive  Local Pref  Md5 Auth  Min TTL  MED   Remote ASN  VLAN
169.254.20.1  as65547  [no filtering]  [no filtering]  []           3              3           false             6          3               2          None        None      None     None  None        None
169.254.40.1  as65547  [no filtering]  [no filtering]  []           3              3           false             6          3               2          None        None      None     None  None        None
169.254.60.1  as65547  [no filtering]  [no filtering]  []           0              0           false             6          0               2          None        None      None     None  None        None

Destination  Nexthop  Vlan  Preference

switch0/qsfp0
=============
Autoneg  Fec   Speed
false    None  Speed100G

Address          Lot            VLAN
169.254.10.2/30  initial-infra  None
169.254.30.2/30  initial-infra  None

BGP Peer      Config   Export          Import          Communities  Connect Retry  Delay Open  Enforce First AS  Hold Time  Idle Hold Time  Keepalive  Local Pref  Md5 Auth  Min TTL  MED   Remote ASN  VLAN
169.254.10.1  as65547  [no filtering]  [no filtering]  []           3              3           false             6          0               2          None        None      None     None  None        None
169.254.30.1  as65547  [no filtering]  [no filtering]  []           3              3           false             6          0               2          None        None      None     None  None        None

Destination  Nexthop  Vlan  Preference

Now I try to create/set (another/same) fake peer to use an ASN not already in use by the existing peers.

$ oxide system networking bgp peer set --rack 619c28eb-c688-4823-b846-a6f9ed25be12 --switch switch1 --port qsfp0 --addr 169.254.60.1 --bgp-config as65548
Error Response: status: 409 Conflict; headers: {"content-type": "application/json", "x-request-id": "f559d782-5ae7-4055-9ada-e0375f13c8be", "content-length": "155", "date": "Thu, 29 Aug 2024 01:37:43 GMT"}; value: Error { error_code: Some("Conflict"), message: "a different asn is already configured on this switch", request_id: "f559d782-5ae7-4055-9ada-e0375f13c8be" }

Resulting state after above command (I see 169.254.60.1 has disappeared from the output of the first command, but as65548 has appeared in the output of the second command):

$ oxide system networking bgp show-status
switch0
=======
Peer Address  Local ASN  Remote ASN  Session State  State Duration
169.254.30.1  65547      64502       Established    1day 1h 21m 27s 616ms
169.254.10.1  65547      64500       Established    1day 1h 21m 26s 559ms

switch1
=======
Peer Address  Local ASN  Remote ASN  Session State  State Duration
169.254.20.1  65547      64500       Established    1day 1h 6m 4s 590ms
169.254.40.1  65547      64502       Established    1day 48m 12s 210ms

$ oxide system networking switch-port-settings show
switch1/qsfp0
=============
Autoneg  Fec   Speed
false    None  Speed100G

Address          Lot            VLAN
169.254.20.2/30  initial-infra  None
169.254.40.2/30  initial-infra  None

BGP Peer      Config   Export          Import          Communities  Connect Retry  Delay Open  Enforce First AS  Hold Time  Idle Hold Time  Keepalive  Local Pref  Md5 Auth  Min TTL  MED   Remote ASN  VLAN
169.254.20.1  as65547  [no filtering]  [no filtering]  []           3              3           false             6          3               2          None        None      None     None  None        None
169.254.40.1  as65547  [no filtering]  [no filtering]  []           3              3           false             6          3               2          None        None      None     None  None        None
169.254.60.1  as65548  [no filtering]  [no filtering]  []           0              0           false             6          0               2          None        None      None     None  None        None

Destination  Nexthop  Vlan  Preference

switch0/qsfp0
=============
Autoneg  Fec   Speed
false    None  Speed100G

Address          Lot            VLAN
169.254.10.2/30  initial-infra  None
169.254.30.2/30  initial-infra  None

BGP Peer      Config   Export          Import          Communities  Connect Retry  Delay Open  Enforce First AS  Hold Time  Idle Hold Time  Keepalive  Local Pref  Md5 Auth  Min TTL  MED   Remote ASN  VLAN
169.254.10.1  as65547  [no filtering]  [no filtering]  []           3              3           false             6          0               2          None        None      None     None  None        None
169.254.30.1  as65547  [no filtering]  [no filtering]  []           3              3           false             6          0               2          None        None      None     None  None        None

Destination  Nexthop  Vlan  Preference

I performed the same set of operations on my real peer 169.254.40.1, and it had the same behavior.

Initial state:

$ oxide system networking bgp show-status
switch0
=======
Peer Address  Local ASN  Remote ASN  Session State  State Duration
169.254.30.1  65547      64502       Established    1day 1h 41m 53s 255ms
169.254.10.1  65547      64500       Established    1day 1h 41m 52s 198ms

switch1
=======
Peer Address  Local ASN  Remote ASN  Session State  State Duration
169.254.40.1  65547      64502       Established    1day 1h 8m 37s 881ms
169.254.20.1  65547      64500       Established    1day 1h 26m 30s 261ms

$ oxide system networking switch-port-settings show
switch1/qsfp0
=============
Autoneg  Fec   Speed
false    None  Speed100G

Address          Lot            VLAN
169.254.20.2/30  initial-infra  None
169.254.40.2/30  initial-infra  None

BGP Peer      Config   Export          Import          Communities  Connect Retry  Delay Open  Enforce First AS  Hold Time  Idle Hold Time  Keepalive  Local Pref  Md5 Auth  Min TTL  MED   Remote ASN  VLAN
169.254.20.1  as65547  [no filtering]  [no filtering]  []           3              3           false             6          3               2          None        None      None     None  None        None
169.254.40.1  as65547  [no filtering]  [no filtering]  []           3              3           false             6          3               2          None        None      None     None  None        None
169.254.60.1  as65548  [no filtering]  [no filtering]  []           0              0           false             6          0               2          None        None      None     None  None        None

Destination  Nexthop  Vlan  Preference

switch0/qsfp0
=============
Autoneg  Fec   Speed
false    None  Speed100G

Address          Lot            VLAN
169.254.10.2/30  initial-infra  None
169.254.30.2/30  initial-infra  None

BGP Peer      Config   Export          Import          Communities  Connect Retry  Delay Open  Enforce First AS  Hold Time  Idle Hold Time  Keepalive  Local Pref  Md5 Auth  Min TTL  MED   Remote ASN  VLAN
169.254.10.1  as65547  [no filtering]  [no filtering]  []           3              3           false             6          0               2          None        None      None     None  None        None
169.254.30.1  as65547  [no filtering]  [no filtering]  []           3              3           false             6          0               2          None        None      None     None  None        None

Destination  Nexthop  Vlan  Preference

Try to create/set with same address as existing real peer, different ASN.

$ oxide system networking bgp peer set --rack 619c28eb-c688-4823-b846-a6f9ed25be12 --switch switch1 --port qsfp0 --addr 169.254.40.1 --bgp-config as65548
Error Response: status: 409 Conflict; headers: {"content-type": "application/json", "x-request-id": "30ee7ed5-b4c0-4252-a732-f29c7207ca5e", "content-length": "155", "date": "Thu, 29 Aug 2024 01:58:57 GMT"}; value: Error { error_code: Some("Conflict"), message: "a different asn is already configured on this switch", request_id: "30ee7ed5-b4c0-4252-a732-f29c7207ca5e" }

Resulting state after above command:

$ oxide system networking bgp show-status
switch0
=======
Peer Address  Local ASN  Remote ASN  Session State  State Duration
169.254.10.1  65547      64500       Established    1day 1h 42m 26s 21ms
169.254.30.1  65547      64502       Established    1day 1h 42m 27s 78ms

switch1
=======
Peer Address  Local ASN  Remote ASN  Session State  State Duration
169.254.20.1  65547      64500       Established    1day 1h 27m 4s 89ms

$ oxide system networking switch-port-settings show
switch1/qsfp0
=============
Autoneg  Fec   Speed
false    None  Speed100G

Address          Lot            VLAN
169.254.20.2/30  initial-infra  None
169.254.40.2/30  initial-infra  None

BGP Peer      Config   Export          Import          Communities  Connect Retry  Delay Open  Enforce First AS  Hold Time  Idle Hold Time  Keepalive  Local Pref  Md5 Auth  Min TTL  MED   Remote ASN  VLAN
169.254.20.1  as65547  [no filtering]  [no filtering]  []           3              3           false             6          3               2          None        None      None     None  None        None
169.254.40.1  as65548  [no filtering]  [no filtering]  []           0              0           false             6          0               2          None        None      None     None  None        None
169.254.60.1  as65548  [no filtering]  [no filtering]  []           0              0           false             6          0               2          None        None      None     None  None        None

Destination  Nexthop  Vlan  Preference

switch0/qsfp0
=============
Autoneg  Fec   Speed
false    None  Speed100G

Address          Lot            VLAN
169.254.10.2/30  initial-infra  None
169.254.30.2/30  initial-infra  None

BGP Peer      Config   Export          Import          Communities  Connect Retry  Delay Open  Enforce First AS  Hold Time  Idle Hold Time  Keepalive  Local Pref  Md5 Auth  Min TTL  MED   Remote ASN  VLAN
169.254.10.1  as65547  [no filtering]  [no filtering]  []           3              3           false             6          0               2          None        None      None     None  None        None
169.254.30.1  as65547  [no filtering]  [no filtering]  []           3              3           false             6          0               2          None        None      None     None  None        None

Destination  Nexthop  Vlan  Preference

Try to create/set with same address as existing real peer, same real ASN that was previously in use.

$ oxide system networking bgp peer set --rack 619c28eb-c688-4823-b846-a6f9ed25be12 --switch switch1 --port qsfp0 --addr 169.254.40.1 --bgp-config as65547
Error Response: status: 409 Conflict; headers: {"content-type": "application/json", "x-request-id": "f0922504-8d98-4271-97e9-9523fe3da70c", "content-length": "155", "date": "Thu, 29 Aug 2024 02:19:52 GMT"}; value: Error { error_code: Some("Conflict"), message: "a different asn is already configured on this switch", request_id: "f0922504-8d98-4271-97e9-9523fe3da70c" }

Resulting state (169.254.40.1 now using 65547 and as65547):

$ oxide system networking bgp show-status
switch0
=======
Peer Address  Local ASN  Remote ASN  Session State  State Duration
169.254.30.1  65547      64502       Established    1day 2h 23m 37s 638ms
169.254.10.1  65547      64500       Established    1day 2h 23m 36s 581ms

switch1
=======
Peer Address  Local ASN  Remote ASN  Session State  State Duration
169.254.40.1  65547      64502       Established    20m 11s 699ms
169.254.20.1  65547      64500       Established    1day 2h 8m 14s 710ms

$ oxide system networking switch-port-settings show
switch1/qsfp0
=============
Autoneg  Fec   Speed
false    None  Speed100G

Address          Lot            VLAN
169.254.20.2/30  initial-infra  None
169.254.40.2/30  initial-infra  None

BGP Peer      Config   Export          Import          Communities  Connect Retry  Delay Open  Enforce First AS  Hold Time  Idle Hold Time  Keepalive  Local Pref  Md5 Auth  Min TTL  MED   Remote ASN  VLAN
169.254.20.1  as65547  [no filtering]  [no filtering]  []           3              3           false             6          3               2          None        None      None     None  None        None
169.254.40.1  as65547  [no filtering]  [no filtering]  []           0              0           false             6          0               2          None        None      None     None  None        None
169.254.60.1  as65548  [no filtering]  [no filtering]  []           0              0           false             6          0               2          None        None      None     None  None        None

Destination  Nexthop  Vlan  Preference

switch0/qsfp0
=============
Autoneg  Fec   Speed
false    None  Speed100G

Address          Lot            VLAN
169.254.10.2/30  initial-infra  None
169.254.30.2/30  initial-infra  None

BGP Peer      Config   Export          Import          Communities  Connect Retry  Delay Open  Enforce First AS  Hold Time  Idle Hold Time  Keepalive  Local Pref  Md5 Auth  Min TTL  MED   Remote ASN  VLAN
169.254.10.1  as65547  [no filtering]  [no filtering]  []           3              3           false             6          0               2          None        None      None     None  None        None
169.254.30.1  as65547  [no filtering]  [no filtering]  []           3              3           false             6          0               2          None        None      None     None  None        None

Destination  Nexthop  Vlan  Preference

For reference, here is the ASN information:

$ oxide system networking bgp config list
[
  {
    "asn": 65551,
    "description": "hello5",
    "id": "1c013259-97ff-426f-8414-7cacf08dcae4",
    "name": "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz12345678901",
    "time_created": "2024-08-29T00:07:02.774536Z",
    "time_modified": "2024-08-29T00:07:02.774536Z"
  }, {
    "asn": 65547,
    "description": "BGP config for AS 65547",
    "id": "a3b1a17e-7167-47a2-9dff-5ce8cb5ab38b",
    "name": "as65547",
    "time_created": "2024-08-28T00:16:10.327972Z",
    "time_modified": "2024-08-28T00:16:10.327972Z"
  }, {
    "asn": 65548,
    "description": "hello",
    "id": "5b278ef8-7da2-4c84-859e-e955bbb202ed",
    "name": "as65548",
    "time_created": "2024-08-28T23:28:31.655895Z",
    "time_modified": "2024-08-28T23:28:31.655895Z"
  }, {
    "asn": 65549,
    "description": "hello3",
    "id": "8773e937-dea8-422f-8dcf-5add050081eb",
    "name": "as65549",
    "time_created": "2024-08-28T23:37:09.307159Z",
    "time_modified": "2024-08-28T23:37:09.307159Z"
  }, {
    "asn": 65550,
    "description": "0",
    "id": "fef48d7f-97a3-4553-aade-42c1dd2386bb",
    "name": "as65550",
    "time_created": "2024-08-28T23:57:34.067397Z",
    "time_modified": "2024-08-28T23:57:34.067397Z"
  }, {
    "asn": 4294967295,
    "description": "hello6",
    "id": "26a80a18-a43c-4228-a62c-e4e6a9a54bb0",
    "name": "as65552",
    "time_created": "2024-08-29T00:14:51.310793Z",
    "time_modified": "2024-08-29T00:14:51.310793Z"
  }
]

Resulting state:

On g3 (switch1):

root@oxz_switch:~# mgadm bgp c r l
[
    Router {
        asn: 65547,
        graceful_shutdown: false,
        id: 65547,
        listen: "[::]:179",
    },
]

And for reference, on g0 (switch0):

root@oxz_switch:~# mgadm bgp c r l
[
    Router {
        asn: 65547,
        graceful_shutdown: false,
        id: 65547,
        listen: "[::]:179",
    },
]
elaine-oxide commented 3 months ago

Similar issue for bgp peer set with peer address not already in use but different asn than is already in use. That is, there doesn't already have to be an existing use of the peer address for this issue to appear during bgp peer set, as long as a different asn is used.