vmware / photon

Minimal Linux container host
https://vmware.github.io/photon
Other
3.04k stars 698 forks source link

Missing e1000 network driver in Photon OS 4.0 Rev2 #1411

Open JonathanVQP opened 1 year ago

JonathanVQP commented 1 year ago

Describe the bug

I am running Photon OS 4.0 Rev2 in the latest VMware Workstation. There was a system crash and when I rebooted the VM, it appears that the e1000 driver became lost. It does not show up in dmesg. I extracted the photon-4.0-c001795b8 iso and unfortunately, I cannot locate any network driver in the RPMS directory and its subdirectories. Apparently, it is not labeled as e1000driver or something obvious but is labeled something less obvious. Does anyone know what the e1000 driver is called in the Photon OS iso for this version?

Reproduction steps

  1. Photon OS VM crash in VMware workstation
  2. Reboot
  3. Missing e1000 network driver ...

Expected behavior

I expect to have internet connection with the e1000 driver loaded. It is not loaded after my system crash.

Additional context

No response

sshedi commented 1 year ago

cc: @tapakund @keerthanakalyan

kashwindayan commented 1 year ago

Hi @JonathanVQP , I tried reproducing the problem on Fusion and workstation with photon-4.0-c001795b8 iso.

photon-4.0-c001795b8 iso corresponds to 5.10.83-7 linux version. Tried with both vmxnet and e1000e on fusion, but not able to reproduce the crash. Can you please let me know what changes led to system crash or can you let me know the steps tried by you?

Also please share:

Thanks, Ashwin Kamat

dcasota commented 1 year ago

Hi @JonathanVQP, e1000 typically has been an abbreviation for Intel 82545EM Gigabit Ethernet NIC. An USBC-E1000 nic adapter has the ASIX AX88179 chipset. For security purposes, in Photon OS' default, by far not ANY device drivers from Linux kernel are included. See 4.0/SPECS/linux e.g. in config_x86_64: ~"# CONFIG_AX88796B_PHY is not set".~ edited:

CONFIG_USB_NET_AX8817X=m
CONFIG_USB_NET_AX88179_178A=m

Correction: the AX88179 should work. In theory (untested), check lsmod | grep ax88 and run sudo modprobe -v ax88179.

Can you describe "system crash"? Is it VMware workstation 17.x (usb arbitrator service) related? or vhw version related? see description from @kashwindayan . Hope this helps. -Daniel

JonathanVQP commented 1 year ago

Ashwin, files as requested.

The system crash I experienced was a total vmware shutdown. Thanks, Jonathan journalctlb.out.txt dmesg.out.txt vmware.log journalctlsystemdnetworkd.txt

JonathanVQP commented 1 year ago

Hi @JonathanVQP, e1000 typically has been an abbreviation for Intel 82545EM Gigabit Ethernet NIC. An USBC-E1000 nic adapter has the ASIX AX88179 chipset. For security purposes, in Photon OS' default, by far not ANY device drivers from Linux kernel are included. See 4.0/SPECS/linux e.g. in config_x86_64: ~"# CONFIG_AX88796B_PHY is not set".~ edited:

CONFIG_USB_NET_AX8817X=m
CONFIG_USB_NET_AX88179_178A=m

Correction: the AX88179 should work. In theory (untested), check lsmod | grep ax88 and run sudo modprobe -v ax88179.

Can you describe "system crash"? Is it VMware workstation 17.x (usb arbitrator service) related? or vhw version related? see description from @kashwindayan . Hope this helps. -Daniel

@kashwindayan, I was running Photon OS 4.0 Rev 2 in VMware and left it on all night. The next day I noticed the power to my PC was off (fsck ran with no bad block or errors). When I reloaded VMware workstation and then Photon OS, it could not reach the internet. The logs indicate that the e1000 driver somehow got "lost". I was hoping to install the e1000 rpm from the Photon OS iso if I could find the rpm.

JonathanVQP commented 1 year ago

@kashwindayan I was running Photon OS 4.0 Rev 2 in VMware and left it on all night. The next day I noticed the power to my PC was off (fsck ran with no bad block or errors). When I reloaded VMware workstation 17.0.1 and then Photon OS, it could not reach the internet. The logs indicate that the e1000 driver somehow got "lost". I was hoping to install the e1000 rpm from the Photon OS iso if I could find the rpm. I ran lsmod | grep ax88 and it returned nothing. I ran sudo modprobe -v ax88179 and it returned "modprobe: FATAL: Module ax88179 not found in directory /lib/modules/5.10.168-2.ph4-esx.

JonathanVQP commented 1 year ago

@kashwindayan I created another Photon OS 4.0 Rev2 VM and ran dmesg. As you can see in this one, the Intel Pro e1000 driver is loaded as opposed to the other Photon OS VM where it is missing. dmesg1.out.txt

dcasota commented 1 year ago

https://kb.vmware.com/s/article/2005315 contains information about bluetooth behavior during snapshots. Is a connectivity with wired only ethernet configurable?

JonathanVQP commented 1 year ago

@dcasota Connectivity is with wired ethernet. I don't know how to configure wired only ethernet in Photon since eth0 is missing.

dcasota commented 1 year ago

@kashwindayan fyi in journalctlb.out, there are entries ‚link down‘ for eth0 (vmxnet3), and parsed routing policy rules about eth1 (?).

edited March21st 2023 @JonathanVQP Sorry, my bad. Mentioning USBC-E1000 nic adapter was wrong. With respect to the vmware.log, the VMware Photon OS 4.0 Rev2 64-bit.vmx contains vhw20. With vhw10 does the issue still occur?

JonathanVQP commented 1 year ago

@kashwindayan the issue still occurs. I may have to restore from an old backup since I cannot seem to resolve this and that the e1000 rpm from the iso is unknown.

dcasota commented 1 year ago

@JonathanVQP @kashwindayan,

Here some findings.

First, here is no e1000 rpm. The Linux e1000 ethernet functionality comes with the Linux kernel, see 4.0/SPECS/linux e.g. in config_esx

CONFIG_E1000=m
CONFIG_E1000E=m

There must be dependencies because the entries are not =y.

Accordingly to journalctlsystemdnetworkd.txt, eth0 was present and gained a DHCPv4 address on Mar 7, but not on March 16. The 2nd Photon OS 4 rev2 vm works flawlessly with the actual VMware workstation vnet configuration.

Hence there was a configuration change between the dates. tdnf-automatic is configured, accordingly to journalctlb.out.txt on Mar 20 16:10:27. In theory, during the date frame some package changes could have occured.

The open-vm-tools package would be a candidate to investigate, because on vm suspend, open-vm-tools triggers the script \etc\vmware-tools\scripts\vmware\network. The script can bring interfaces up or down and it can remove addresses of an interface. The vmware.log is from 2023-03-20T16:11 and contains "vmx Guest: toolbox: Version: 12.1.5.39265" but not the latest one 12.2.0. tdnf-automatic should have triggered an installation if there was an active eth0.

The linux-esx-5.10.168-2.ph4.x86_64.rpm has a date stamp of March 2nd. The Kernel cmdline contains "net.ifnames=0". This disables the Predictable Network Interface Names meccano. So far, I didn't find a clue why this came in.

JonathanVQP commented 1 year ago

@kashwindayan Thanks for the info! With your added information, and more digging in the net. I found out that the first Photon OS VM (the one without e1000 intel pro driver) did not have the modules installed. I did the following: modprobe e1000 and modprobe e1000e. I verified that they were loaded via dmesg. I also checked the /lib/modules/5.10.168-4.ph4/modules.builtin and saw that both e1000 and e1000e modules will be loaded. The issue now is that after I reboot, these modules are not saved into the configuration, and I have to repeat the entire process again.

kashwindayan commented 1 year ago

Hi @JonathanVQP ,

Did modprobe e1000/e1000e fix your issue?

This will make sure the modules are loaded post reboot as well.

Also, we don't know the reason yet for the system crash. The backtrace is not available. The logs which you provided are post reboot logs. We might need the logs which has the backtrace in it, so that we can know the reason for systemcrash.

Please provide the journalctl/vmware-"x".log which has the reason for system crash. You might need to check the older logs for the same. Please attach the proper crash logs.

JonathanVQP commented 1 year ago

@kashwindayan Please close this issue. This became too tedious to figure out so I am restoring from an old backup. Hopefully it will work! Thanks for all your assistance!!