rbonghi / jetson_stats

📊 Simple package for monitoring and control your NVIDIA Jetson [Orin, Xavier, Nano, TX] series
https://rnext.it/jetson_stats
GNU Affero General Public License v3.0
2.14k stars 261 forks source link

"Error connection" on NVIDIA Orin Jetson-Small Developer Kit with jetpack 5.0 Pre-DP (factory setup) jetpack 4.1.5 #381

Closed pseyfert-sevensense closed 1 year ago

pseyfert-sevensense commented 1 year ago

Describe the bug I can't start jtop on a orin devkit (JETSON_MODEL=NVIDIA Orin Jetson-Small Developer Kit). The devkit is still rather close to the factory setup with the preinstalled OS, jetpack, python from nvidia. When starting jtop on the command line, all it prints is Error connection and exits with status code 0.

Let me know if there's anything i can provide to help debug or if i should try a different jtop version - the single line Error connection feels very unspecific.

To Reproduce Steps to reproduce the behavior:

  1. Get a Orin Jetson small devkit.
  2. sudo pip install jetson-stats
  3. sudo systemctl restart jtop.service (just to be sure)
  4. log out and back in (just to be sure /etc/profile.d/jtop_env.sh is freshly applied)
  5. start jtop, see "Error connection" printed

2023-03-01-131325_1381x677_scrot

(in the terminal on the right I entered a few line feeds before executing the commands on the left to distinguish the previously existing journal entries from those that came from my commands).

Expected behavior jtop starts like shown on the website and one can see cpu usage, gpu usage, etc.

Additional context

pseyfert@orin-1:~$ env | grep JET
JETSON_CUDA_ARCH_BIN=8.7
JETSON_MODULE=NVIDIA Jetson AGX Orin
JETSON_L4T=34.0.1
JETSON_CODENAME=Concord
JETSON_MODEL=NVIDIA Orin Jetson-Small Developer Kit
JETSON_SERIAL_NUMBER=<snap>
JETSON_SOC=tegra23x
JETSON_P_NUMBER=p3701-0000
JETSON_JETPACK=5.0 PRE-DP

pseyfert@orin-1:~$ jetson_release -v
Software part of jetson-stats 4.1.5 - (c) 2023, Raffaello Bonghi
Model: NVIDIA Orin Jetson-Small Developer Kit - Jetpack 5.0 PRE-DP [L4T 34.0.1]
NV Power Mode: MAXN - Type: 0
Hardware:
 - 699-level Part Number: 699-13701-0000-500 M.0
 - P-Number: p3701-0000
 - Module: NVIDIA Jetson AGX Orin
 - SoC: tegra23x
 - CUDA Arch BIN: 8.7
 - Codename: Concord
 - Serial Number: <snap>
Platform:
 - Machine: aarch64
 - System: Linux
 - Distribution: Ubuntu 20.04 focal
 - Release: 5.10.65-tegra
 - Python: 3.8.10
jtop:
 - Version: 4.1.5
 - Service: Active
Libraries:
 - CUDA not installed!
 - cuDNN: Not installed
 - TensorRT: Not installed
 - VPI: Not installed
 - Vulkan: 1.3.203
 - OpenCV: 4.2.0 - with CUDA: NO
pseyfert@orin-1:~$ 

Board

Jetpack

Jetson-Stats

rbonghi commented 1 year ago

Hi @pseyfert-sevensense ,

Thank you for your issue! Yes, this bug is weird. I'm still working on a new release that is more stable (and with a debugging transparent #378 )

It looks like jtop service is crashing for an error coming from tegrastats or jetson_clocks

Can you share the output from:

sudo jetson_clocks --show
sudo tegrastats

Thank you in advance

pseyfert-sevensense commented 1 year ago

Hi,

thanks for looking into this. Here the output of the two commands

pseyfert@orin-1:~$ sudo jetson_clocks --show
[sudo] password for pseyfert: 
SOC family:tegra234  Machine:NVIDIA Orin Jetson-Small Developer Kit
Online CPUs: 0-11
cpu0: Online=1 Governor=schedutil MinFreq=115200 MaxFreq=2201600 CurrentFreq=2112000 IdleStates: WFI=1 c7=1 
cpu1: Online=1 Governor=schedutil MinFreq=115200 MaxFreq=2201600 CurrentFreq=2201600 IdleStates: WFI=1 c7=1 
cpu10: Online=1 Governor=schedutil MinFreq=115200 MaxFreq=2201600 CurrentFreq=2201600 IdleStates: WFI=1 c7=1 
cpu11: Online=1 Governor=schedutil MinFreq=115200 MaxFreq=2201600 CurrentFreq=1190400 IdleStates: WFI=1 c7=1 
cpu2: Online=1 Governor=schedutil MinFreq=115200 MaxFreq=2201600 CurrentFreq=2112000 IdleStates: WFI=1 c7=1 
cpu3: Online=1 Governor=schedutil MinFreq=115200 MaxFreq=2201600 CurrentFreq=2201600 IdleStates: WFI=1 c7=1 
cpu4: Online=1 Governor=schedutil MinFreq=115200 MaxFreq=2201600 CurrentFreq=1344000 IdleStates: WFI=1 c7=1 
cpu5: Online=1 Governor=schedutil MinFreq=115200 MaxFreq=2201600 CurrentFreq=1344000 IdleStates: WFI=1 c7=1 
cpu6: Online=1 Governor=schedutil MinFreq=115200 MaxFreq=2201600 CurrentFreq=1804800 IdleStates: WFI=1 c7=1 
cpu7: Online=1 Governor=schedutil MinFreq=115200 MaxFreq=2201600 CurrentFreq=1420800 IdleStates: WFI=1 c7=1 
cpu8: Online=1 Governor=schedutil MinFreq=115200 MaxFreq=2201600 CurrentFreq=115200 IdleStates: WFI=1 c7=1 
cpu9: Online=1 Governor=schedutil MinFreq=115200 MaxFreq=2201600 CurrentFreq=1574400 IdleStates: WFI=1 c7=1 
GPU MinFreq=114750000 MaxFreq=1300500000 CurrentFreq=114750000
EMC MinFreq=204000000 MaxFreq=3199000000 CurrentFreq=665600000 FreqOverride=0
DLA0_CORE MinFreq=0 MaxFreq=1536000000 CurrentFreq=1536000000
DLA0_FALCON MinFreq=0 MaxFreq=832000000 CurrentFreq=832000000
DLA1_CORE MinFreq=0 MaxFreq=1536000000 CurrentFreq=1536000000
DLA1_FALCON MinFreq=0 MaxFreq=832000000 CurrentFreq=832000000
PVA0_VPS0 MinFreq=0 MaxFreq=1152000000 CurrentFreq=1152000000
PVA0_AXI MinFreq=0 MaxFreq=832000000 CurrentFreq=832000000
FAN Dynamic Speed control=active hwmon3_pwm=64
NV Power Mode: MAXN
pseyfert@orin-1:~$ sudo tegrastats 
03-01-2023 17:08:44 RAM 909/30654MB (lfb 7240x4MB) SWAP 0/15327MB (cached 0MB) CPU [4%@191,11%@268,0%@1188,0%@294,0%@115,0%@115,0%@114,0%@115,0%@115,0%@115,0%@115,0%@115] EMC_FREQ 0%@204 GR3D_FREQ 0%@114 GR3D2_FREQ 0%@114 VIC_LOAD @0 APE 174 CV0@-256C CPU@48.562C Tdiode@38C SOC2@47.125C SOC0@47.937C CV1@-256C GPU@47C SOC1@49.312C CV2@-256C VDD_GPU_SOC 6778mW/6778mW VDD_CPU_CV 1196mW/1196mW VIN_SYS_5V0 5434mW/5434mW NC 0mW/0mW VDDQ_VDD2_1V8AO 1004mW/1004mW NC 0mW/0mW
03-01-2023 17:08:45 RAM 909/30654MB (lfb 7240x4MB) SWAP 0/15327MB (cached 0MB) CPU [5%@268,13%@457,0%@2188,0%@2189,0%@115,0%@115,0%@1806,0%@1803,0%@115,0%@115,0%@115,0%@115] EMC_FREQ 0%@204 GR3D_FREQ 0%@114 GR3D2_FREQ 0%@114 VIC_LOAD @0 APE 174 CV0@-256C CPU@48.656C Tdiode@38C SOC2@47C SOC0@47.906C CV1@-256C GPU@47.125C SOC1@49.25C CV2@-256C VDD_GPU_SOC 6778mW/6778mW VDD_CPU_CV 1196mW/1196mW VIN_SYS_5V0 5434mW/5434mW NC 0mW/0mW VDDQ_VDD2_1V8AO 1004mW/1004mW NC 0mW/0mW
03-01-2023 17:08:46 RAM 909/30654MB (lfb 7240x4MB) SWAP 0/15327MB (cached 0MB) CPU [3%@2192,12%@331,0%@345,0%@344,0%@114,0%@115,0%@115,0%@115,0%@115,1%@114,0%@115,0%@115] EMC_FREQ 0%@2133 GR3D_FREQ 0%@114 GR3D2_FREQ 0%@114 VIC_LOAD @0 APE 174 CV0@-256C CPU@48.406C Tdiode@38C SOC2@47.031C SOC0@47.906C CV1@-256C GPU@47.187C SOC1@49.125C CV2@-256C VDD_GPU_SOC 6778mW/6778mW VDD_CPU_CV 1196mW/1196mW VIN_SYS_5V0 5434mW/5434mW NC 0mW/0mW VDDQ_VDD2_1V8AO 1004mW/1004mW NC 0mW/0mW
03-01-2023 17:08:47 RAM 909/30654MB (lfb 7240x4MB) SWAP 0/15327MB (cached 0MB) CPU [5%@2193,10%@992,0%@1943,0%@1359,0%@114,0%@115,0%@115,0%@191,1%@115,0%@115,0%@115,0%@115] EMC_FREQ 0%@2133 GR3D_FREQ 0%@114 GR3D2_FREQ 0%@114 VIC_LOAD @0 APE 174 CV0@-256C CPU@48.718C Tdiode@38C SOC2@46.968C SOC0@47.906C CV1@-256C GPU@47.218C SOC1@49.187C CV2@-256C VDD_GPU_SOC 6778mW/6778mW VDD_CPU_CV 1196mW/1196mW VIN_SYS_5V0 5535mW/5459mW NC 0mW/0mW VDDQ_VDD2_1V8AO 1004mW/1004mW NC 0mW/0mW
03-01-2023 17:08:48 RAM 909/30654MB (lfb 7240x4MB) SWAP 0/15327MB (cached 0MB) CPU [4%@832,6%@2190,0%@1836,0%@1652,0%@114,1%@115,0%@115,0%@115,0%@115,0%@115,0%@172,0%@1881] EMC_FREQ 0%@2133 GR3D_FREQ 0%@114 GR3D2_FREQ 0%@114 VIC_LOAD @0 APE 174 CV0@-256C CPU@48.625C Tdiode@38C SOC2@47C SOC0@47.843C CV1@-256C GPU@47.062C SOC1@49.125C CV2@-256C VDD_GPU_SOC 6778mW/6778mW VDD_CPU_CV 1196mW/1196mW VIN_SYS_5V0 5535mW/5474mW NC 0mW/0mW VDDQ_VDD2_1V8AO 1004mW/1004mW NC 0mW/0mW
03-01-2023 17:08:49 RAM 909/30654MB (lfb 7240x4MB) SWAP 0/15327MB (cached 0MB) CPU [4%@268,9%@268,0%@268,0%@267,0%@115,0%@116,0%@117,0%@115,0%@115,0%@115,0%@115,0%@114] EMC_FREQ 0%@204 GR3D_FREQ 0%@114 GR3D2_FREQ 0%@114 VIC_LOAD @0 APE 174 CV0@-256C CPU@48.437C Tdiode@37.75C SOC2@46.937C SOC0@47.843C CV1@-256C GPU@47.25C SOC1@49.125C CV2@-256C VDD_GPU_SOC 6778mW/6778mW VDD_CPU_CV 1196mW/1196mW VIN_SYS_5V0 5434mW/5467mW NC 0mW/0mW VDDQ_VDD2_1V8AO 1004mW/1004mW NC 0mW/0mW
rbonghi commented 1 year ago

Thank you, just for curiosity, did you reboot your board?

pseyfert-sevensense commented 1 year ago

yes, i tried rebooting in between at some point.

rbonghi commented 1 year ago

I have an idea

try:

jtop --restore
pseyfert-sevensense commented 1 year ago

Doesn't seem to have an effect

pseyfert@orin-1:~$ jtop 
Error connection
pseyfert@orin-1:~$ jtop --restore
Error connection
pseyfert@orin-1:~$ jtop
Error connection
pseyfert@orin-1:~$ 

and in the journal during that time:

Mär 02 23:59:14 orin-1 jtop[17045]: [INFO] jtop.service - tegrastats started 500ms
Mär 02 23:59:17 orin-1 jtop[17045]: [INFO] jtop.service - tegrastats close
Mär 02 23:59:17 orin-1 jtop[17045]: [INFO] jtop.service - jetson_clocks show closed
Mär 02 23:59:18 orin-1 jtop[17045]: [INFO] jtop.service - tegrastats started 500ms
Mär 02 23:59:21 orin-1 jtop[17045]: [INFO] jtop.service - tegrastats close
Mär 02 23:59:21 orin-1 jtop[17045]: [INFO] jtop.service - jetson_clocks show closed
Mär 02 23:59:22 orin-1 jtop[17045]: [INFO] jtop.service - tegrastats started 500ms
Mär 02 23:59:25 orin-1 jtop[17045]: [INFO] jtop.service - tegrastats close
Mär 02 23:59:25 orin-1 jtop[17045]: [INFO] jtop.service - jetson_clocks show closed

PS: I'm ssh'ing from home back to the office and the Mär reveals my LC_… are showing up here. so I now tried to set all LC_… to C in /etc/systemd/system/jtop.service, reloaded the systemd service, restarted the service. exported LC_ALL=C in my shell and tried again. That didn't help either, but i though it was worth ruling out that some localization was breaking something.

rbonghi commented 1 year ago

I think I fixed this issue in my next release. There are a few bugs, but if you want to try in advance, you can check if it works

sudo pip3 install jetson-stats==4.2.0rc0

Documented bugs in this release candidate: https://github.com/rbonghi/jetson_stats/issues/383

pseyfert-sevensense commented 1 year ago

Thanks,

still doesn't work here, but now I have a different error

Mär 06 11:40:00 orin-1 jtop[2992]: Process JtopServer-1:
Mär 06 11:40:00 orin-1 jtop[2992]: Traceback (most recent call last):
Mär 06 11:40:00 orin-1 jtop[2992]:   File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
Mär 06 11:40:00 orin-1 jtop[2992]:     self.run()
Mär 06 11:40:00 orin-1 jtop[2992]:   File "/usr/local/lib/python3.8/dist-packages/jtop/service.py", line 308, in run
Mär 06 11:40:00 orin-1 jtop[2992]:     data = self.jtop_decode()
Mär 06 11:40:00 orin-1 jtop[2992]:   File "/usr/local/lib/python3.8/dist-packages/jtop/service.py", line 577, in jtop_decode
Mär 06 11:40:00 orin-1 jtop[2992]:     data['temperature'] = self.temperature.get_status()
Mär 06 11:40:00 orin-1 jtop[2992]:   File "/usr/local/lib/python3.8/dist-packages/jtop/core/temperature.py", line 138, in get_status
Mär 06 11:40:00 orin-1 jtop[2992]:     values = read_temperature(sensor)
Mär 06 11:40:00 orin-1 jtop[2992]:   File "/usr/local/lib/python3.8/dist-packages/jtop/core/temperature.py", line 32, in read_temperature
Mär 06 11:40:00 orin-1 jtop[2992]:     value = float(cat(path)) / 1000.0
Mär 06 11:40:00 orin-1 jtop[2992]:   File "/usr/local/lib/python3.8/dist-packages/jtop/core/common.py", line 106, in cat
Mär 06 11:40:00 orin-1 jtop[2992]:     return f.readline().rstrip('\x00')
Mär 06 11:40:00 orin-1 jtop[2992]: OSError: [Errno 22] Invalid argument
Mär 06 11:40:01 orin-1 jtop[2973]: [INFO] jtop.service - Service closed
Mär 06 11:40:01 orin-1 systemd[1]: jtop.service: Succeeded.

and I could trace that down to /sys/devices/virtual/thermal/thermal_zone8/temp, which can't be read:

pseyfert@orin-1:~$ cat /sys/devices/virtual/thermal/thermal_zone8/temp
cat: /sys/devices/virtual/thermal/thermal_zone8/temp: Invalid argument

(I just posted that to https://forums.developer.nvidia.com/t/sys-devices-virtual-thermal-thermal-zone8-temp-invalid-argument/245035 to get a better idea of what's behind that)

rbonghi commented 1 year ago

Thank you! I just check, but now there is an error more clear

rbonghi commented 1 year ago

I'm adding an extra check for these variables. There are too many boards that are difficult to control all work. These feedbacks are really important

rbonghi commented 1 year ago

Sorry for my third message @pseyfert-sevensense I read on the other post, that jtop breaks your board. What is this means, did you reinstall from scratch, or only was not running jtop?

pseyfert-sevensense commented 1 year ago

Hi, sorry that's unclear. I meant i can't start jtop. Everything else is fine. I reinstalled jtop from scratch, the rest of the system is still in its first installation. I just edited the post in the nvidia forum to make it clearer.

btw. do you have a time estimate for an extra check of these variables?

rbonghi commented 1 year ago

no worries! :-) As you write on the forum, this project is open-source. I work much as I can in my spare time :-)

I add this check for tonight and will write you a message later today.

Can you share the output from this command, just to be sure the rest of the service works fine.

journalctl -u jtop.service -n 100 --no-pager

Thank you in advance

pseyfert-sevensense commented 1 year ago

thanks for the ETA. here the journal output:

pseyfert@orin-1:~$ journalctl -u jtop.service --no-pager -n 100
-- Logs begin at Thu 2022-09-08 11:58:15 CEST, end at Mon 2023-03-06 14:44:10 CET. --
Mär 06 11:40:00 orin-1 jtop[2973]: [INFO] jtop.core.temperature - Found thermal "SOC0" in thermal_zone5
Mär 06 11:40:00 orin-1 jtop[2973]: [INFO] jtop.core.temperature - Found thermal "CV1" in thermal_zone3
Mär 06 11:40:00 orin-1 jtop[2973]: [INFO] jtop.core.temperature - Found thermal "GPU" in thermal_zone1
Mär 06 11:40:00 orin-1 jtop[2973]: [INFO] jtop.core.temperature - Found thermal "tj" in thermal_zone8
Mär 06 11:40:00 orin-1 jtop[2973]: [INFO] jtop.core.temperature - Found thermal "SOC1" in thermal_zone6
Mär 06 11:40:00 orin-1 jtop[2973]: [INFO] jtop.core.temperature - Found thermal "CV2" in thermal_zone4
Mär 06 11:40:00 orin-1 jtop[2973]: [WARNING] jtop.core.power - Skipped NC /sys/bus/i2c/devices/1-0041/hwmon/hwmon2/in1_label
Mär 06 11:40:00 orin-1 jtop[2973]: [INFO] jtop.core.power - Alarms VDDQ_VDD2_1V8AO - {'crit_alarm': 0, 'max_alarm': 0}
Mär 06 11:40:00 orin-1 jtop[2973]: [WARNING] jtop.core.power - Skipped NC /sys/bus/i2c/devices/1-0041/hwmon/hwmon2/in3_label
Mär 06 11:40:00 orin-1 jtop[2973]: [WARNING] jtop.core.power - Skipped "sum of shunt voltages" /sys/bus/i2c/devices/1-0041/hwmon/hwmon2/in7_label
Mär 06 11:40:00 orin-1 jtop[2973]: [INFO] jtop.core.power - Alarms VDD_GPU_SOC - {'crit_alarm': 0, 'max_alarm': 0}
Mär 06 11:40:00 orin-1 jtop[2973]: [INFO] jtop.core.power - Alarms VDD_CPU_CV - {'crit_alarm': 0, 'max_alarm': 0}
Mär 06 11:40:00 orin-1 jtop[2973]: [INFO] jtop.core.power - Alarms VIN_SYS_5V0 - {'crit_alarm': 0, 'max_alarm': 0}
Mär 06 11:40:00 orin-1 jtop[2973]: [WARNING] jtop.core.power - Skipped "sum of shunt voltages" /sys/bus/i2c/devices/1-0040/hwmon/hwmon1/in7_label
Mär 06 11:40:00 orin-1 jtop[2973]: [INFO] jtop.core.power - Found I2C power monitor
Mär 06 11:40:00 orin-1 jtop[2973]: [INFO] jtop.core.power - Found name=1-00081 type=USB model=
Mär 06 11:40:00 orin-1 jtop[2973]: [INFO] jtop.core.power - Found name=1-00082 type=USB model=
Mär 06 11:40:00 orin-1 jtop[2973]: [INFO] jtop.core.power - Found SYSTEM power monitor
Mär 06 11:40:00 orin-1 jtop[2973]: [INFO] jtop.core.fan - Fan pwmfan(1) found in /sys/class/hwmon/hwmon3
Mär 06 11:40:00 orin-1 jtop[2973]: [INFO] jtop.core.fan - RPM pwm_tach found in /sys/class/hwmon/hwmon0
Mär 06 11:40:00 orin-1 jtop[2973]: [INFO] jtop.core.fan - Found nvfancontrol.service
Mär 06 11:40:00 orin-1 jtop[2973]: [INFO] jtop.core.jetson_clocks - jetson_clocks found in /usr/bin/jetson_clocks
Mär 06 11:40:00 orin-1 jtop[2973]: [INFO] jtop.core.nvpmodel - nvpmodel running in [0]MAXN - Default: 0
Mär 06 11:40:00 orin-1 jtop[2992]: [INFO] jtop.service - Initialization service
Mär 06 11:40:00 orin-1 jtop[2992]: Process JtopServer-1:
Mär 06 11:40:00 orin-1 jtop[2992]: Traceback (most recent call last):
Mär 06 11:40:00 orin-1 jtop[2992]:   File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
Mär 06 11:40:00 orin-1 jtop[2992]:     self.run()
Mär 06 11:40:00 orin-1 jtop[2992]:   File "/usr/local/lib/python3.8/dist-packages/jtop/service.py", line 308, in run
Mär 06 11:40:00 orin-1 jtop[2992]:     data = self.jtop_decode()
Mär 06 11:40:00 orin-1 jtop[2992]:   File "/usr/local/lib/python3.8/dist-packages/jtop/service.py", line 577, in jtop_decode
Mär 06 11:40:00 orin-1 jtop[2992]:     data['temperature'] = self.temperature.get_status()
Mär 06 11:40:00 orin-1 jtop[2992]:   File "/usr/local/lib/python3.8/dist-packages/jtop/core/temperature.py", line 138, in get_status
Mär 06 11:40:00 orin-1 jtop[2992]:     values = read_temperature(sensor)
Mär 06 11:40:00 orin-1 jtop[2992]:   File "/usr/local/lib/python3.8/dist-packages/jtop/core/temperature.py", line 32, in read_temperature
Mär 06 11:40:00 orin-1 jtop[2992]:     value = float(cat(path)) / 1000.0
Mär 06 11:40:00 orin-1 jtop[2992]:   File "/usr/local/lib/python3.8/dist-packages/jtop/core/common.py", line 106, in cat
Mär 06 11:40:00 orin-1 jtop[2992]:     return f.readline().rstrip('\x00')
Mär 06 11:40:00 orin-1 jtop[2992]: OSError: [Errno 22] Invalid argument
Mär 06 11:40:01 orin-1 jtop[2973]: [INFO] jtop.service - Service closed
Mär 06 11:40:01 orin-1 systemd[1]: jtop.service: Succeeded.
Mär 06 14:44:09 orin-1 systemd[1]: Started jtop service.
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.service - jetson_stats 4.2.0rc0 - server loaded
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.core.hardware - Hardware detected aarch64
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.core.hardware - NVIDIA Jetson detected L4T=34.0.1
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.service - Running on Python: 3.8.10
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.core.cpu - Found 12 CPU
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.core.gpu - GPU "ga10b" status in /sys/devices/platform/17000000.ga10b
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.core.gpu - GPU "ga10b" frq in /sys/devices/platform/17000000.ga10b/devfreq/17000000.ga10b
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.core.processes - Process service started
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.core.memory - Found EMC!
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.core.memory - Memory service started
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.core.engine - Special Engine group found: [dlaX]
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.core.engine - Special Engine group found: [pvaX]
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.core.engine - Engines found: [APE DLA0 DLA1 NVDEC NVENC NVJPG PVA0 SE VIC]
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.core.temperature - Found thermal "CV0" in thermal_zone2
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.core.temperature - Found thermal "CPU" in thermal_zone0
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.core.temperature - Found thermal "Tdiode" in thermal_zone9
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.core.temperature - Found thermal "SOC2" in thermal_zone7
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.core.temperature - Found thermal "SOC0" in thermal_zone5
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.core.temperature - Found thermal "CV1" in thermal_zone3
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.core.temperature - Found thermal "GPU" in thermal_zone1
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.core.temperature - Found thermal "tj" in thermal_zone8
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.core.temperature - Found thermal "SOC1" in thermal_zone6
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.core.temperature - Found thermal "CV2" in thermal_zone4
Mär 06 14:44:09 orin-1 jtop[3768]: [WARNING] jtop.core.power - Skipped NC /sys/bus/i2c/devices/1-0041/hwmon/hwmon2/in1_label
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.core.power - Alarms VDDQ_VDD2_1V8AO - {'crit_alarm': 0, 'max_alarm': 0}
Mär 06 14:44:09 orin-1 jtop[3768]: [WARNING] jtop.core.power - Skipped NC /sys/bus/i2c/devices/1-0041/hwmon/hwmon2/in3_label
Mär 06 14:44:09 orin-1 jtop[3768]: [WARNING] jtop.core.power - Skipped "sum of shunt voltages" /sys/bus/i2c/devices/1-0041/hwmon/hwmon2/in7_label
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.core.power - Alarms VDD_GPU_SOC - {'crit_alarm': 0, 'max_alarm': 0}
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.core.power - Alarms VDD_CPU_CV - {'crit_alarm': 0, 'max_alarm': 0}
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.core.power - Alarms VIN_SYS_5V0 - {'crit_alarm': 0, 'max_alarm': 0}
Mär 06 14:44:09 orin-1 jtop[3768]: [WARNING] jtop.core.power - Skipped "sum of shunt voltages" /sys/bus/i2c/devices/1-0040/hwmon/hwmon1/in7_label
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.core.power - Found I2C power monitor
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.core.power - Found name=1-00081 type=USB model=
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.core.power - Found name=1-00082 type=USB model=
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.core.power - Found SYSTEM power monitor
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.core.fan - Fan pwmfan(1) found in /sys/class/hwmon/hwmon3
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.core.fan - RPM pwm_tach found in /sys/class/hwmon/hwmon0
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.core.fan - Found nvfancontrol.service
Mär 06 14:44:09 orin-1 jtop[3768]: [INFO] jtop.core.jetson_clocks - jetson_clocks found in /usr/bin/jetson_clocks
Mär 06 14:44:10 orin-1 jtop[3768]: [INFO] jtop.core.nvpmodel - nvpmodel running in [0]MAXN - Default: 0
Mär 06 14:44:10 orin-1 jtop[3787]: [INFO] jtop.service - Initialization service
Mär 06 14:44:10 orin-1 jtop[3787]: Process JtopServer-1:
Mär 06 14:44:10 orin-1 jtop[3787]: Traceback (most recent call last):
Mär 06 14:44:10 orin-1 jtop[3787]:   File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
Mär 06 14:44:10 orin-1 jtop[3787]:     self.run()
Mär 06 14:44:10 orin-1 jtop[3787]:   File "/usr/local/lib/python3.8/dist-packages/jtop/service.py", line 308, in run
Mär 06 14:44:10 orin-1 jtop[3787]:     data = self.jtop_decode()
Mär 06 14:44:10 orin-1 jtop[3787]:   File "/usr/local/lib/python3.8/dist-packages/jtop/service.py", line 577, in jtop_decode
Mär 06 14:44:10 orin-1 jtop[3787]:     data['temperature'] = self.temperature.get_status()
Mär 06 14:44:10 orin-1 jtop[3787]:   File "/usr/local/lib/python3.8/dist-packages/jtop/core/temperature.py", line 138, in get_status
Mär 06 14:44:10 orin-1 jtop[3787]:     values = read_temperature(sensor)
Mär 06 14:44:10 orin-1 jtop[3787]:   File "/usr/local/lib/python3.8/dist-packages/jtop/core/temperature.py", line 32, in read_temperature
Mär 06 14:44:10 orin-1 jtop[3787]:     value = float(cat(path)) / 1000.0
Mär 06 14:44:10 orin-1 jtop[3787]:   File "/usr/local/lib/python3.8/dist-packages/jtop/core/common.py", line 106, in cat
Mär 06 14:44:10 orin-1 jtop[3787]:     return f.readline().rstrip('\x00')
Mär 06 14:44:10 orin-1 jtop[3787]: OSError: [Errno 22] Invalid argument
Mär 06 14:44:10 orin-1 jtop[3768]: [INFO] jtop.service - Service closed
Mär 06 14:44:10 orin-1 systemd[1]: jtop.service: Succeeded.
rbonghi commented 1 year ago

Looks, all detected! This is great! I will add tonight in all devices an extra check if all files are accessible.

I should fix this issue and your previous bug.

Thank you again :-)

rbonghi commented 1 year ago

Luckily (I hope), I had the same bug on my device and got the reason for this weird error.

The sensor should be on your device as well as the iwlwifi, the WiFi temperature sensor. If this device is disabled result inaccessible the sensor as well. [EDIT] Or in your case the tj_therm sensor

Now before reading all sensors I check the status ad availability and otherwise I put this sensor offline.

New release candidate version 4.2.0rc1 (available in few minutes)

sudo pip3 install jetson-stats==4.2.0rc1

Let me know if I fix this bug.

pseyfert-sevensense commented 1 year ago

Looks pretty good. (no crash, no error in the log)

2023-03-07-091610_1265x667_scrot

Thanks a lot for the quick fix!

rbonghi commented 1 year ago

Cool! Let me know if you like the new user interface. I added much further information that was hidden before. Improve the GPU page; I think it is more readable now. (You can activate/deactivate the 3D scaling) Also, the Info page has by default hidden the serial number, quicker take and share screenshots.

Meanwhile I close this issue