oxidecomputer / maghemite

A routing stack written in Rust.
Mozilla Public License 2.0
37 stars 2 forks source link

Error adding system routes #57

Closed jgallagher closed 1 year ago

jgallagher commented 1 year ago

Within the switch zone on madrid, we observed that even though magehemite had found prefixes for peers (unknown here is actually the hostname of the peer):

root@oxz_switch:~# /opt/oxide/mg-ddm/bin/ddmadm get-prefixes
Destination              Next Hop                  Path
fdb0:a840:2504:355::/64  fe80::aa40:25ff:fe04:355  BRM44220001
fdb0:a840:2504:157::/64  fe80::aa40:25ff:fe04:157  unknown

there were no routes set up:

root@oxz_switch:~# /opt/oxide/dendrite/bin/swadm route list
Subnet                                      Port   Link Gateway
root@oxz_switch:~#

@rcgoodfellow noted this in the mg-ddm logs that indicates the culprit:

Dec 28 01:39:52.442 INFO [53] nbr is fe80::aa40:25ff:fe04:157@unknown server
Dec 28 01:39:52.442 INFO [53] exchange: listening on [fe80::aa40:25ff:fe4f:3a90]:56797
Dec 28 01:39:52.442 INFO waiting for exchange server to start
Dec 28 01:39:54.444 WARN [53] exchange pull: hyper error: error trying to connect: tcp connect error: Connection refused (os error 146)
Dec 28 01:39:54.445 INFO sending 1 routes to dendrite
Dec 28 01:39:54.451 ERRO [53] add system route: expected tofino port number tfportrear10_0
Dec 28 01:39:54.451 INFO removing routes 0 from dendrite
rcgoodfellow commented 1 year ago

@jgallagher I believe the ddmd build from #58 should fix this.

jgallagher commented 1 year ago

@rcgoodfellow Using the ddmd build from #58 with the latest omicron main dendrite commit (963bcd784e4385cbb45ecfba548ac4b1abb6edc1, which per @internet-diglett matches 8067c39088482ada72b005cb394270815d181bee), I'm seeing a different error in the mg-ddm logs:

Dec 28 00:47:16.146 INFO [53] nbr is fe80::aa40:25ff:fe04:157@unknown server
Dec 28 00:47:16.147 INFO [53] exchange: listening on [fe80::aa40:25ff:feec:f2c6]:56797
Dec 28 00:47:16.147 INFO waiting for exchange server to start
Dec 28 00:47:18.149 WARN [53] exchange pull: hyper error: error trying to connect: tcp connect error: Connection refused (os error 146)
Dec 28 00:47:18.150 INFO sending 1 routes to dendrite
Dec 28 00:47:18.157 ERRO [53] add system route: doesn't match pattern "(^[qQ][sS][fF][pP](([0-9])|([1-2][0-9])|(3[0-1]))$)|(^[rR][eE][aA][rR](([0-9])|([1-2][0-9])|(3[0-1]))$)|(^[iI][nN
][tT]0$)"
Dec 28 00:47:18.157 INFO removing routes 0 from dendrite
rcgoodfellow commented 1 year ago

Ah i see, sorry, one moment ....

rcgoodfellow commented 1 year ago

@jgallagher I believe I've fixed this - if your lap times are long in trying these changes out, I can spin up my ddmd/dendrite development environment and run this through tests.