Closed zhewang1-intc closed 9 months ago
Please build the zello_sysman tool to check which Sysman API returns unexpected values or errors that XPUM doesn't handle properly.
wget https://raw.githubusercontent.com/intel/compute-runtime/releases/23.48/level_zero/tools/test/black_box_tests/zello_sysman.cpp
g++ -O2 -Wall -o zello_sysman zello_sysman.cpp -lze_loader -locloc
sudo ./zello_sysman --memory
sudo ./zello_sysman --temperature
sudo ./zello_sysman --frequency
From the zello_sysman's output, we see that memory maximum bandwidth is 0 returned by driver. It is the root cause of the crash. We will fix it in next release. After XPUM starts with the root privilege, frequency can be changed successfully.
---- Memory tests ----
Memory Type = ZES_MEM_TYPE_DDR
On Subdevice = 0
Subdevice Id = 0
Memory Size = 0
Number of channels = 2
Memory Health = ZES_MEM_HEALTH_UNKNOWN
The total allocatable memory in bytes = 17079205888
The free memory in bytes = 17010581504
Memory Read Counter = 17389969126208
Memory Write Counter = 342828039872
Memory Maximum Bandwidth = 0
Memory Timestamp = 18503820255
---- Temperature tests ----
For subDevice 0 temperature current state for ZES_TEMP_SENSORS_GLOBAL is: 50
For subDevice 0 temperature current state for ZES_TEMP_SENSORS_GPU is: 50
---- Frequency tests ----
freqProperties.type = 0
freqProperties.canControl = 1
freqProperties.isThrottleEventSupported = 0
freqProperties.min = 300
freqProperties.max = 2400
freqState.currentVoltage = -1
freqState.request = 2400
freqState.tdp = 0
freqState.efficient = 2100
freqState.actual = 2400
freqState.throttleReasons = 0
freqRange.min = 300
freqRange.max = 2400
frequency = 300
frequency = 350
frequency = 400
frequency = 450
frequency = 500
frequency = 550
frequency = 600
frequency = 650
frequency = 700
frequency = 750
frequency = 800
frequency = 850
frequency = 900
frequency = 950
frequency = 1000
frequency = 1050
frequency = 1100
frequency = 1150
frequency = 1200
frequency = 1250
frequency = 1300
frequency = 1350
frequency = 1400
frequency = 1450
frequency = 1500
frequency = 1550
frequency = 1600
frequency = 1650
frequency = 1700
frequency = 1750
frequency = 1800
frequency = 1850
frequency = 1900
frequency = 1950
frequency = 2000
frequency = 2050
frequency = 2100
frequency = 2150
frequency = 2200
frequency = 2250
frequency = 2300
frequency = 2350
frequency = 2400
Setting Frequency Range . min 300
Setting Frequency Range . max 300
After Setting Getting Frequency Range . min 300
After Setting Getting Frequency Range . max 300
Setting Frequency Range . min 300
Setting Frequency Range . max 2400
After Setting Getting Frequency Range . min 300
After Setting Getting Frequency Range . max 2400
Thanks Intel XPU team!
hi, I try to use this tool to limit GPU's frequency in my specified range. i install v1.2.29 deb package(xpumanager_1.2.29_20240201.035533.2b2f658d.u22.04_amd64.deb) on my machine after i execute
xpumcli discovery
i got an errorError: XPUM Service Status Error.
then i check my xpum-service state and i gotbut if i try to run xpumd directly, the service not be killed, but you can find some warnings & errors
btw, if i execute xpumd with sudo, the service will crash
anyway, after i run
xpumd
, xpumcli seems can give me some useful msg:but if i give the frequency range,
xpumcli
will throw an error without any hint.