Closed ubedan closed 6 months ago
I wouldn't assume your issue is related to any other reported issue without debugging it, FWIW. Probably worth looking at the logs for the network/physical:default
service and seeing what the contents of /etc/ipadm/ipadm.conf
are when it's not working as you'd expect, etc.
Thanks!
Nothing different in network/physical:default
Pertinent info from ipadm.conf: _protocol=ipv4;forwarding=off; _protocol=ipv6;forwarding=on; _ifname=igb0;_ifclass=0;_families=2,26;
It may be worth waiting to debug this until after the upstream issue is resolved...
Which issue is that?
I haven't seen any official bug report yet (couldn't find one anyway)
This is the start of a long debugging escapade on the Triton discord:
goekesmi - 05/07/2024 11:02 AM I'm in the process of upgrading my fleet from 20240307T000552Z to 20240502T000615Z. I have rebooted my test CN to the new platform image, and find myself with a broken cn-agent...
From the messages, the Intel igb nic seems to be a factor.
New information on that other upstream issue suggests it's older than this.
Updating to Helios-2.0.22668 didn't change the issue... Still lose ip address on boot/reboot.
Running ipadm create-addr and route add default
Nothing in /etc/ipadm/ipadm seems to change from running ipadm create-addr.
dmesg entries: May 17 04:48:53 helios genunix: [ID 936769 kern.info] timerfd0 is /pseudo/timerfd@0 May 17 04:48:53 helios mac: [ID 435574 kern.info] NOTICE: igb0 link up, 1000 Mbps, full duplex May 17 04:48:58 helios mac: [ID 736570 kern.info] NOTICE: e1000g0 unregistered May 17 04:49:01 helios pseudo: [ID 129642 kern.info] pseudo-device: devinfo0 May 17 04:49:01 helios genunix: [ID 936769 kern.info] devinfo0 is /pseudo/devinfo@0 May 17 04:49:01 helios login: [ID 574227 auth.alert] Solaris_audit adt_get_local_address failed, no Audit IP address available, faking loopback and error: Network is down May 17 04:49:01 helios login: [ID 369739 auth.error] pam_unix_cred: cannot load ttyname: Network is down, continuing. May 17 04:49:53 helios mac: [ID 469746 kern.info] NOTICE: e1000g0 registered May 17 04:51:21 helios ipf: [ID 774698 kern.info] IP Filter: v4.1.9, running.
I attempted to track down the smf svcprop that holds ipadm config, but got lost between IPMGMT_CMD_AOBJNAME2ADDROBJ and svcprop -f ip-interface-management...
So, where is the interface configuration stored? Anything I can do to help? Thanks in advance.
Running ipadm create-addr and route add default from the console solves the issue.
Can you provide the exact commands you ran, and their output, preferably by just copying and pasting the whole unabridged transcript?
They don't come from a transcript...
ipadm create-addr -t -T static -a 10.0.0.100 igb0/v4
route add default 10.0.0.17
-- ADDROBJ TYPE STATE ADDR lo0/v4 static ok 127.0.0.1/8 igb0/v4 static ok 10.0.0.100/8 <== Line missing after reboot, found after ipadm... lo0/v6 static ok ::1/128
Routing Table: IPv4 Destination Gateway Flags Ref Use Interface
default 10.0.0.17 UG 4 48866 <== Line missing before route add cmd
10.0.0.0 10.0.0.100 U 3 4 igb0
127.0.0.1 127.0.0.1 UH 2 0 lo0
Routing Table: IPv6 Destination/Mask Gateway Flags Ref Use If
::1 ::1 UH 2 0 lo0
One slightly strange bit is a route -p add default will report that the line already exists in the config file.
They don't come from a transcript...
To be clear, I mean when you log in on the console and perform the actions, make a transcript by copying and pasting the whole thing; e.g.,
gimlet-sn07 console login: root
Last login: Fri May 17 05:30:37 from 172.20.16.18
May 17 05:30:52 EVT22200007 login: ROOT LOGIN /dev/console
#####
## ##
## # ## ## ##
## # ## ## ## Oxide Computer Company
## # ## ### Engineering
## ## ## ##
##### ## ## Gimlet
gimlet-sn07 # ipadm show-addr
ADDROBJ TYPE STATE ADDR
lo0/v4 static ok 127.0.0.1/8
igb0/v4 dhcp ok 172.20.2.107/24
lo0/v6 static ok ::1/128
igb0/ll addrconf ok fe80::eaea:6aff:fe09:8690%igb0/10
cxgbe0/ll addrconf inaccessible fe80::aa40:25ff:fe01:114%cxgbe0/10
cxgbe1/ll addrconf inaccessible fe80::aa40:25ff:fe01:11c%cxgbe1/10
gimlet-sn07 # logout
gimlet-sn07 console login:
That's generally the best way to make sure you're giving complete context when reporting an issue. Looking at what you've been doing to make your system work:
ipadm create-addr -t -T static -a 10.0.0.100 igb0/v4
If you look at the ipadm(8) manual page, you'll see that the -t option is for the creation of temporary objects. Temporary objects are not persistent; they do not survive reboots.
route add default 10.0.0.17
The route(8) command is much older than ipadm, and by default it deals only in the live state of the system. As per the manual, it has a -p flag for managing a set of persistent routes that would then survive a reboot.
If you just need a single default gateway, the defaultrouter(5) file is probably the easiest way to make that persistent.
In summary, I think the issue you're having is probably just that you're not using the persistent modes of the tools. When you reboot, the configuration as specified is correctly discarded.
That's it! Thanks!!!
Running the May 8th-ish Helios update disrupts network connectivity after booting.
igb0 is down on boot: helios login: [ID 574227 auth.alert] Solaris_audit adt_get_local_address failed, no Audit IP address available, faking loopback and error: Network is down helios login: [ID 369739 auth.error] pam_unix_cred: cannot load ttyname: Network is down, continuing.
ipadm has no addr for igb0 after booting. ipadm create-addr and route add fixes it.
This may be related to a Illumos / Triton issue where iPXE is failing after upgrading, and my guess is it will be fixed upstream.