rbonghi / jetson_stats

📊 Simple package for monitoring and control your NVIDIA Jetson [Orin, Xavier, Nano, TX] series
https://rnext.it/jetson_stats
GNU Affero General Public License v3.0
2.17k stars 264 forks source link

Jtop 4.2.3 crashes when navigating to the Jtop GPU tab on an Nvidia Xavier AGX running Jetpack 5.0.2 GA. #443

Closed dmbuck32 closed 10 months ago

dmbuck32 commented 1 year ago

Describe the bug

Jtop 4.2.3 crashes when navigating to the Jtop GPU tab on an Nvidia Xavier AGX running Jetpack 5.0.2 GA.

To Reproduce

Steps to reproduce the behaviour:

  1. Download Jtop: sudo -H pip install -U jetson-stats
  2. Restart Jtop service: sudo systemctl restart jtop.service
  3. Run Jtop: jtop
  4. Navigate to the GPU tab: Press 2
  5. Jtop Crashes with error message.

Screenshots

Traceback (most recent call last):
  File "/usr/local/bin/jtop", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/jtop/__main__.py", line 159, in main
    curses.wrapper(JTOPGUI, jetson, pages, init_page=args.page,
  File "/usr/lib/python3.8/curses/__init__.py", line 105, in wrapper
    return func(stdscr, *args, **kwds)
  File "/usr/local/lib/python3.8/dist-packages/jtop/gui/jtopgui.py", line 100, in __init__
    self.run(loop, seconds)
  File "/usr/local/lib/python3.8/dist-packages/jtop/gui/jtopgui.py", line 129, in run
    self.draw()
  File "/usr/local/lib/python3.8/dist-packages/jtop/gui/jtopgui.py", line 143, in draw
    page.draw(self.key, self.mouse)
  File "/usr/local/lib/python3.8/dist-packages/jtop/gui/pgpu.py", line 209, in draw
    self.process_table.draw(first + 2 + gpu_height, 0, width, height_table, key, mouse)
  File "/usr/local/lib/python3.8/dist-packages/jtop/gui/lib/process_table.py", line 85, in draw
    return nprocess
UnboundLocalError: local variable 'nprocess' referenced before assignment

Expected behavior

The GPU tab to display a graph with the list of active GPU processes.

Board

Output from jetson_release -v:

Software part of jetson-stats 4.2.3 - (c) 2023, Raffaello Bonghi
Model: Jetson-AGX - Jetpack 5.0.2 GA [L4T 35.1.0]
Serial Number: [XXX Show with: jetson_release -s XXX]
Hardware:
 - 699-level Part Number: 699-82888-0004-400 P.0
 - P-Number: p2888-0004
 - Module: NVIDIA Jetson AGX Xavier (32 GB ram)
 - SoC: tegra194
 - CUDA Arch BIN: 7.2
Platform:
 - Machine: aarch64
 - System: Linux
 - Distribution: Ubuntu 20.04 focal
 - Release: 5.10.104-tegra
 - Python: 3.8.10
jtop:
 - Version: 4.2.3
 - Service: Active
Libraries:
 - CUDA: 11.4.239
 - cuDNN: 8.4.1.50
 - TensorRT: 5.0.2
 - VPI: 2.1.6
 - OpenCV: Not installed

Log from jtop.service

Attach here the output from: journalctl -u jtop.service -n 100 --no-pager

-- Logs begin at Mon 2023-03-27 17:54:06 UTC, end at Thu 2023-08-17 05:10:05 UTC. --
Aug 17 04:35:44 ghrunner2 jtop[1253304]: [INFO] jtop.core.power - Alarms VDDRQ - {'crit_alarm': 0, 'max_alarm': 0}
Aug 17 04:35:44 ghrunner2 jtop[1253304]: [INFO] jtop.core.power - Alarms SYS5V - {'crit_alarm': 0, 'max_alarm': 0}
Aug 17 04:35:44 ghrunner2 jtop[1253304]: [WARNING] jtop.core.power - Skipped "sum of shunt voltages" /sys/bus/i2c/devices/1-0041/hwmon/hwmon5/in7_label
Aug 17 04:35:44 ghrunner2 jtop[1253304]: [INFO] jtop.core.power - Alarms GPU - {'crit_alarm': 0, 'max_alarm': 0}
Aug 17 04:35:44 ghrunner2 jtop[1253304]: [INFO] jtop.core.power - Alarms CPU - {'crit_alarm': 0, 'max_alarm': 0}
Aug 17 04:35:44 ghrunner2 jtop[1253304]: [INFO] jtop.core.power - Alarms SOC - {'crit_alarm': 0, 'max_alarm': 0}
Aug 17 04:35:44 ghrunner2 jtop[1253304]: [WARNING] jtop.core.power - Skipped "sum of shunt voltages" /sys/bus/i2c/devices/1-0040/hwmon/hwmon4/in7_label
Aug 17 04:35:44 ghrunner2 jtop[1253304]: [INFO] jtop.core.power - Found I2C power monitor
Aug 17 04:35:44 ghrunner2 jtop[1253304]: [WARNING] jtop.core.power - Skipped usb-charger type=USB in=usb-charger
Aug 17 04:35:44 ghrunner2 jtop[1253304]: [INFO] jtop.core.fan - Fan pwmfan(1) found in /sys/class/hwmon/hwmon3
Aug 17 04:35:44 ghrunner2 jtop[1253304]: [INFO] jtop.core.fan - RPM pwm_tach found in /sys/class/hwmon/hwmon1
Aug 17 04:35:44 ghrunner2 jtop[1253304]: [INFO] jtop.core.jetson_clocks - jetson_clocks found in /usr/bin/jetson_clocks
Aug 17 04:35:44 ghrunner2 jtop[1253304]: [WARNING] jtop.core.nvpmodel - nvpmodel not available
Aug 17 04:35:44 ghrunner2 jtop[1253325]: [INFO] jtop.service - Initialization service
Aug 17 04:35:45 ghrunner2 jtop[1253325]: [INFO] jtop.core.jetson_clocks - Store jetson_clocks configuration in /usr/local/jtop/l4t_dfs.conf
Aug 17 04:35:45 ghrunner2 jtop[1253325]: [INFO] jtop.service - service ready
Aug 17 04:35:45 ghrunner2 jtop[1253325]: [INFO] jtop.service - jtop timer thread started 1000ms
Aug 17 04:35:55 ghrunner2 jtop[1253325]: [INFO] jtop.service - jtop timer thread close

Log from jetson-stats installation

Attach here the output from: sudo -H pip3 install --no-cache-dir -U jetson-stats

Requirement already up-to-date: jetson-stats in /usr/local/lib/python3.8/dist-packages (4.2.3)
Requirement already satisfied, skipping upgrade: distro in /usr/lib/python3/dist-packages (from jetson-stats) (1.4.0)
Requirement already satisfied, skipping upgrade: smbus2 in /usr/local/lib/python3.8/dist-packages (from jetson-stats) (0.4.2)
n4s commented 1 year ago

This occurs when no GPU process is created, the for at line 72 of process_table.py fails and the nprocess variable has no value. Adding nprocess = 0 just before the for fixes it.

_(workaround: edit your file at /usr/local/lib/python3.8/dist-packages/jtop/gui/lib/processtable.py and add the nprocess = 0 before the for at line 72)

rbonghi commented 10 months ago

This issue is fixed with the latest release of jetson-stats 4.2.4. Please update it!

sudo pip3 install -U jetson-stats