Closed flumm closed 2 months ago
Hi Flumm, I am testing a Flex 140 as well, PVE 8.2 with kernel 6.5.3-13-pve Windows 10 VM with latest driver. Heaven Unigine benchmark runs OK I had stability issues until I updated the firmware on the Flex 140 If you are like me, and never used the GPU in a stand-alone setup, the firmware may be original from factory. IIRC Intel GPUs are updated only when the user drivers are installed.
How did you update the firmware exactly? Does that only work with windows drivers? Is there any way to see which firmware is on there?
Any help appreciated, since the docs i could find are rather sparse ;)
yes, they are... get the IGSC utility from https://github.com/intel/igsc I built it under PVE Than get the latest windows driver, and unpack it under Windows (i.e. start the install...) in your temp directory find the firmware and opcode folders and the firmware files in them. transfer those to your Flex system, and update with the igsc utility there is an additional file for the opcode-data, but that is not required. This can only be sourced from intel or your oem vendor
forgot to mention that the igsc utility does allow you to check what version you have and to downgrade, but does not allow you to back up your current firmware...
of course, you can 'simply' install windows bare-metal on the system that has the Flex, and when installing the driver, that will take care of everything for you. I believe the same is included in the supported linux binary drivers from intel repos, but not in the backport. I have not done either of the above... Note that you have to update both GPU separately on the Flex 140, as there are two... My Flex 140 has the following:
root@epyc:/usr/src/igsc/src# ./igsc fw version --device /dev/mei1
Device: FW Version: DG02_2.2353
root@epyc:/usr/src/igsc/src# ./igsc fw-data version --device /dev/mei1
Device: Fw Data Version: Major Version: 101, OEM Manufacturing Data Version: 291, Major VCN: 1
root@epyc:/usr/src/igsc/src# ./igsc oprom-code version --device /dev/mei1
OPROM CODE Version: 14 00 2C 04 00 00 00 00
root@epyc:/usr/src/igsc/src# ./igsc oprom-data version --device /dev/mei1
OPROM DATA Version: 14 00 24 04 00 00 00 00
Maybe someone from Intel could comment if this is the latest...
just to update in the meantime, it seems it was not a firmware issue what i had, but a thermal one
i tried passing the card through to a windows vm (i though maybe the driver can upgrade the firmware this way, but no) and i saw that the cards were in the 90 degree celcius range (idling), so i increased the fan speed, and since then it ran stable
i'll eventually come around to updating the firmware, but it seems it's not necessary for me at the moment
for the record, my firmware version currently is: Device: FW Version: DG02_2.2273
Any action from us.
Any action from us.
while my issues seemed to disappear with proper cooling, could you check the logs i posted if that's intended and normal behaviour in that case? normally i'd expect hardware to either work slowly or crash outright when not cooled properly but the weird hangs/reset seemed off
if that is the intended/normal behavior, you can close the issue ofc
also it would be nice if there would be another official way to obtain firmware upgrades besides installing the windows driver (and checking if there is newer firmware altogether) but this is only tangentially related to this issue (is there a better place to request/report that?)
thanks
@flumm igsc tool is an cross platform tool so same can be used for linux. please check repo document link.
thanks for responding.
yes the tool to flash the firmware is clear and that seems to work. My issue was how to get the updated firmware? I did not find any intel site that would mention that, so the only way currently is to start the windows install and extract the files from the temporary dir there? (or am i missing something here?)
Ubuntu Package InstallationÁ The kernel and xpu-smi packages can be installed on a bare metal system. Installation on the host is sufficient for hardware management and support of the runtimes in containers and bare metal.
sudo apt install -y \ linux-headers-$(uname -r) \ linux-modules-extra-$(uname -r) \ flex bison \ intel-fw-gpu intel-i915-dkms xpu-smi sudo reboot
Compute and Full instructions: https://dgpu-docs.intel.com/driver/installation.html
thanks for the answer, but I'm not sure how that relates to my question. I wanted to know where can i get the firmware files, besides extracting them from the windows driver? or is it enough to load the latest one from https://github.com/intel-gpu/intel-gpu-firmware ?
Take Ubuntu as the example. intel-fw-gpu contains the latest FWs.
1.4.3.2. Ubuntu Package InstallationÁ The kernel and xpu-smi packages can be installed on a bare metal system. Installation on the host is sufficient for hardware management and support of the runtimes in containers and bare metal. sudo apt install -y \ linux-headers-$(uname -r) \ linux-modules-extra-$(uname -r) \ flex bison \ intel-fw-gpu intel-i915-dkms xpu-smi sudo reboot Compute and Full instructions: https://dgpu-docs.intel.com/driver/installation.html
Hi,
i have some weird stability issue, and wanted to ask if that seems like a software or hardware issue, and how/if we can fix that.
I started a VM with QEMU/KVM with a VF of a Flex 140 with Windows 11. That alone worked fine, drivers in the guest installed ok, device manager + task manager reported everything ok.
I could start Heaven Unigine benchmark, which showed ~140 FPS on low settings (1280x720) After some time though, it dropped to 1 FPS but the task manager still showed 100% utilization.
I played around with rebooting, disabling/enabling the device in device manager, but i got the following logs on the host dmesg:
on trying to remove the virtual functions via sysfs and unloading the driver i got:
Any idea what could cause that?