rbonghi / jetson_stats

馃搳 Simple package for monitoring and control your NVIDIA Jetson [Orin, Xavier, Nano, TX] series
https://rnext.it/jetson_stats
GNU Affero General Public License v3.0
2.17k stars 264 forks source link

Support request for NVIDIA Clara AGX #291

Open engrjislam opened 2 years ago

engrjislam commented 2 years ago

A prompt request to support jtop in Nvidia Clara Agx Dev Kit. Some info is detected correctly and is showing well in jtop already but some are not (as if it can't find information as expected, e.g., CUDA). It would be nice to have support for Nvidia Clara Agx Dev Kit in its future release.

rbonghi commented 1 year ago

Hi @engrjislam ,

Good question. I never tried a Clara Agx, but I will let you know :-) Just for curiosity, if you can try my jetson-stats, can you share the output, just to have an idea about the effort to add this hardware.

Best, Raffaello

engrjislam commented 1 year ago

Hello @rbonghi, I am getting the following console warning for sudo jtop:

[WARN] jetson-stats not supported for [L4T 34.1.2]
Please, try: sudo pip3 install -U jetson-stats or
open a Github issue (press CTRL + Click)

Anyway, jetson_stats successfully installed into the system. Some of the jtop outputs are presented below:

1 2 3 4 5 6-1

Jetpack NOT DETECTED seems quite reasonable (Clara is an AGX system but not a Jetson system). As a result, jetson_stats package is not fully compliant with the AGX system.

[SORRY for the delay ... ]

rbonghi commented 10 months ago

Thank you for your post! I was thinking of not adding this feature on JTOP :-)

Do you know if Clara AGX also has tegrastats?

To figure out how a Clara AGX works, can you also provide me this output:

sudo pip3 install --no-cache-dir -U jetson-stats
journalctl -u jtop.service -n 100 --no-pager
jetson_release -v

And if you can attach the output from

jtop --error-log
engrjislam commented 10 months ago

Thanks for your consideration. I am highlighting the outputs that I am getting from my Clara AGX.

Do you know if Clara AGX also has tegrastats?

tegrastats

journalctl -u jtop.service -n 100 --no-pager

-- Logs begin at Fri 2023-12-08 09:14:31 PST, end at Sun 2024-01-07 15:48:55 PST. --
Jan 07 15:44:41 ubuntu jtop[2972181]: [INFO] jtop.service - Running on Python: 3.8.10
Jan 07 15:44:41 ubuntu jtop[2972181]: [INFO] jtop.core.cpu - Found 8 CPU
Jan 07 15:44:41 ubuntu jtop[2972181]: [INFO] jtop.core.gpu - NVIDIA SMI exist!
Jan 07 15:44:41 ubuntu jtop[2972181]: [WARNING] jtop.core.gpu - No NVIDIA GPU available
Jan 07 15:44:41 ubuntu jtop[2972181]: [INFO] jtop.core.processes - Process service started
Jan 07 15:44:41 ubuntu jtop[2972181]: [INFO] jtop.core.memory - Found EMC!
Jan 07 15:44:41 ubuntu jtop[2972181]: [INFO] jtop.core.memory - Memory service started
Jan 07 15:44:41 ubuntu jtop[2972181]: [INFO] jtop.core.engine - Special Engine group found: [dlaX]
Jan 07 15:44:41 ubuntu jtop[2972181]: [INFO] jtop.core.engine - Special Engine group found: [pvaX]
Jan 07 15:44:41 ubuntu jtop[2972181]: [INFO] jtop.core.engine - Engines found: [APE CVNAS DLA0 DLA1 NVDEC NVENC NVJPG PVA0 PVA1 SE VIC]
Jan 07 15:44:41 ubuntu jtop[2972181]: [INFO] jtop.core.temperature - Found thermal "AUX" in thermal_zone2
Jan 07 15:44:41 ubuntu jtop[2972181]: [INFO] jtop.core.temperature - Found thermal "CPU" in thermal_zone0
Jan 07 15:44:41 ubuntu jtop[2972181]: [INFO] jtop.core.temperature - Found thermal "mlx5" in thermal_zone9
Jan 07 15:44:41 ubuntu jtop[2972181]: [INFO] jtop.core.temperature - Found thermal "PCIe" in thermal_zone7
Jan 07 15:44:41 ubuntu jtop[2972181]: [INFO] jtop.core.temperature - Found thermal "Tboard" in thermal_zone5
Jan 07 15:44:41 ubuntu jtop[2972181]: [INFO] jtop.core.temperature - Found thermal "AO" in thermal_zone3
Jan 07 15:44:41 ubuntu jtop[2972181]: [INFO] jtop.core.temperature - Found thermal "GPU" in thermal_zone1
Jan 07 15:44:41 ubuntu jtop[2972181]: [INFO] jtop.core.temperature - Found thermal "thermal" in thermal_zone8
Jan 07 15:44:41 ubuntu jtop[2972181]: [INFO] jtop.core.temperature - Found thermal "Tdiode" in thermal_zone6
Jan 07 15:44:41 ubuntu jtop[2972181]: [WARNING] jtop.core.temperature - Skipped PMIC
Jan 07 15:44:41 ubuntu jtop[2972181]: [INFO] jtop.core.power - Alarms CV - {'crit_alarm': 0, 'max_alarm': 0}
Jan 07 15:44:41 ubuntu jtop[2972181]: [INFO] jtop.core.power - Alarms VDDRQ - {'crit_alarm': 0, 'max_alarm': 0}
Jan 07 15:44:41 ubuntu jtop[2972181]: [INFO] jtop.core.power - Alarms SYS5V - {'crit_alarm': 0, 'max_alarm': 0}
Jan 07 15:44:41 ubuntu jtop[2972181]: [WARNING] jtop.core.power - Skipped "sum of shunt voltages" /sys/bus/i2c/devices/1-0041/hwmon/hwmon4/in7_label
Jan 07 15:44:41 ubuntu jtop[2972181]: [INFO] jtop.core.power - Alarms GPU - {'crit_alarm': 0, 'max_alarm': 0}
Jan 07 15:44:41 ubuntu jtop[2972181]: [INFO] jtop.core.power - Alarms CPU - {'crit_alarm': 0, 'max_alarm': 0}
Jan 07 15:44:41 ubuntu jtop[2972181]: [INFO] jtop.core.power - Alarms SOC - {'crit_alarm': 0, 'max_alarm': 0}
Jan 07 15:44:41 ubuntu jtop[2972181]: [WARNING] jtop.core.power - Skipped "sum of shunt voltages" /sys/bus/i2c/devices/1-0040/hwmon/hwmon3/in7_label
Jan 07 15:44:41 ubuntu jtop[2972181]: [INFO] jtop.core.power - Found I2C power monitor
Jan 07 15:44:41 ubuntu jtop[2972181]: [WARNING] jtop.core.power - Skipped usb-charger type=USB in=usb-charger
Jan 07 15:44:41 ubuntu jtop[2972181]: [INFO] jtop.core.fan - Fan pwmfan(1) found in /sys/class/hwmon/hwmon6
Jan 07 15:44:41 ubuntu jtop[2972181]: [INFO] jtop.core.fan - Fan pwmfan(1) found in /sys/class/hwmon/hwmon5
Jan 07 15:44:41 ubuntu jtop[2972181]: [INFO] jtop.core.fan - RPM pwm_tach found in /sys/class/hwmon/hwmon2
Jan 07 15:44:41 ubuntu jtop[2972181]: [INFO] jtop.core.fan - Found nvfancontrol.service
Jan 07 15:44:41 ubuntu jtop[2972181]: [INFO] jtop.core.jetson_clocks - jetson_clocks found in /usr/bin/jetson_clocks
Jan 07 15:44:41 ubuntu jtop[2972181]: [WARNING] jtop.core.nvpmodel - nvpmodel not available
Jan 07 15:44:41 ubuntu jtop[2972198]: [INFO] jtop.service - Initialization service
Jan 07 15:44:41 ubuntu jtop[2972198]: [INFO] jtop.core.fan - Initialization pwmfan
Jan 07 15:44:41 ubuntu jtop[2972198]: [WARNING] jtop.core.fan - Fan pwmfan profile manual already active
Jan 07 15:44:42 ubuntu jtop[2972198]: [INFO] jtop.service - service ready
Jan 07 15:44:49 ubuntu jtop[2972198]: [INFO] jtop.service - jtop timer thread started 1000ms
Jan 07 15:44:53 ubuntu jtop[2972198]: [INFO] jtop.service - jtop timer thread close
Jan 07 15:45:30 ubuntu systemd[1]: Stopping jtop service...
Jan 07 15:45:30 ubuntu jtop[2972198]: [INFO] jtop.__main__ - Close service by signal 15
Jan 07 15:45:30 ubuntu jtop[2972198]: [WARNING] jtop.service - KeyboardInterrupt, SystemExit interrupt
Jan 07 15:45:30 ubuntu jtop[2972198]: [INFO] jtop.service - FORCE jtop timer thread close
Jan 07 15:45:30 ubuntu jtop[2972190]: [INFO] jtop.__main__ - Close service by signal 15
Jan 07 15:45:30 ubuntu jtop[2972181]: [INFO] jtop.__main__ - Close service by signal 15
Jan 07 15:45:30 ubuntu jtop[2972181]: [INFO] jtop.service - Service closed
Jan 07 15:45:30 ubuntu systemd[1]: jtop.service: Succeeded.
Jan 07 15:45:30 ubuntu systemd[1]: Stopped jtop service.
Jan 07 15:45:31 ubuntu systemd[1]: Started jtop service.
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.service - jetson_stats 4.2.4 - server loaded
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.config - Load config from /usr/local/jtop/config.json
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.hardware - Hardware detected aarch64
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.hardware - NVIDIA Jetson 699-level Part Number=699-82888-0004-400 M.0
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.hardware - NVIDIA Jetson Module=NVIDIA Jetson AGX Xavier (32 GB ram)
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.hardware - NVIDIA Jetson detected L4T=34.1.2
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.service - Running on Python: 3.8.10
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.cpu - Found 8 CPU
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.gpu - NVIDIA SMI exist!
Jan 07 15:45:31 ubuntu jtop[2972418]: [WARNING] jtop.core.gpu - No NVIDIA GPU available
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.processes - Process service started
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.memory - Found EMC!
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.memory - Memory service started
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.engine - Special Engine group found: [dlaX]
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.engine - Special Engine group found: [pvaX]
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.engine - Engines found: [APE CVNAS DLA0 DLA1 NVDEC NVENC NVJPG PVA0 PVA1 SE VIC]
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.temperature - Found thermal "AUX" in thermal_zone2
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.temperature - Found thermal "CPU" in thermal_zone0
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.temperature - Found thermal "mlx5" in thermal_zone9
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.temperature - Found thermal "PCIe" in thermal_zone7
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.temperature - Found thermal "Tboard" in thermal_zone5
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.temperature - Found thermal "AO" in thermal_zone3
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.temperature - Found thermal "GPU" in thermal_zone1
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.temperature - Found thermal "thermal" in thermal_zone8
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.temperature - Found thermal "Tdiode" in thermal_zone6
Jan 07 15:45:31 ubuntu jtop[2972418]: [WARNING] jtop.core.temperature - Skipped PMIC
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.power - Alarms CV - {'crit_alarm': 0, 'max_alarm': 0}
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.power - Alarms VDDRQ - {'crit_alarm': 0, 'max_alarm': 0}
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.power - Alarms SYS5V - {'crit_alarm': 0, 'max_alarm': 0}
Jan 07 15:45:31 ubuntu jtop[2972418]: [WARNING] jtop.core.power - Skipped "sum of shunt voltages" /sys/bus/i2c/devices/1-0041/hwmon/hwmon4/in7_label
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.power - Alarms GPU - {'crit_alarm': 0, 'max_alarm': 0}
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.power - Alarms CPU - {'crit_alarm': 0, 'max_alarm': 0}
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.power - Alarms SOC - {'crit_alarm': 0, 'max_alarm': 0}
Jan 07 15:45:31 ubuntu jtop[2972418]: [WARNING] jtop.core.power - Skipped "sum of shunt voltages" /sys/bus/i2c/devices/1-0040/hwmon/hwmon3/in7_label
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.power - Found I2C power monitor
Jan 07 15:45:31 ubuntu jtop[2972418]: [WARNING] jtop.core.power - Skipped usb-charger type=USB in=usb-charger
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.fan - Fan pwmfan(1) found in /sys/class/hwmon/hwmon6
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.fan - Fan pwmfan(1) found in /sys/class/hwmon/hwmon5
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.fan - RPM pwm_tach found in /sys/class/hwmon/hwmon2
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.fan - Found nvfancontrol.service
Jan 07 15:45:31 ubuntu jtop[2972418]: [INFO] jtop.core.jetson_clocks - jetson_clocks found in /usr/bin/jetson_clocks
Jan 07 15:45:31 ubuntu jtop[2972418]: [WARNING] jtop.core.nvpmodel - nvpmodel not available
Jan 07 15:45:31 ubuntu jtop[2972435]: [INFO] jtop.service - Initialization service
Jan 07 15:45:31 ubuntu jtop[2972435]: [INFO] jtop.core.fan - Initialization pwmfan
Jan 07 15:45:31 ubuntu jtop[2972435]: [WARNING] jtop.core.fan - Fan pwmfan profile manual already active
Jan 07 15:45:32 ubuntu jtop[2972435]: [INFO] jtop.service - service ready
Jan 07 15:45:36 ubuntu jtop[2972435]: [INFO] jtop.service - jtop timer thread started 1000ms
Jan 07 15:47:55 ubuntu jtop[2972435]: [INFO] jtop.service - jtop timer thread close

jetson_release -v

jetson_release

jtop --error-log

--------------------- PLATFORM -------------------------
Machine: aarch64
System: Linux
Distribution: Ubuntu 20.04 focal
Release: 5.10.65-tegra
Python: 3.8.10
-------------------- RAW OUTPUT ------------------------
------------------
/etc/nv_tegra_release:
# R34 (release), REVISION: 1.2, GCID: 32090851, BOARD: t186ref, EABI: aarch64, DATE: Thu Dec  8 18:51:28 UTC 2022
------------------
/sys/firmware/devicetree/base/model:
Clara-AGX
------------------
/proc/device-tree/nvidia,boardids:
No such file or directory
------------------
/proc/device-tree/compatible:
nvidia,e3900-0000+p2888-0004nvidia,galennvidia,jetson-xaviernvidia,tegra194
------------------
/proc/device-tree/nvidia,dtsfilename:
/dvs/git/dirty/git-master_linux/kernel/kernel-5.10/arch/arm64/boot/dts/../../../../../../hardware/nvidia/platform/t19x/mccoy/kernel-dts/tegra194-p2888-0004-e3900-0000.dts
------------------
I2C-0-0x50:
01 00 FF 00 48 0B 04 00 04 4D 00 00 00 00 00 00    ..每.H....M......
00 00 00 00 36 39 39 2D 38 32 38 38 38 2D 30 30    ....699-82888-00
30 34 2D 34 30 30 20 4D 2E 30 00 00 00 00 00 00    04-400 M.0......
00 00 FF FF FF FF FF FF FF FF FF FF FF FF FF FF    ..每每每每每每每每每每每每每每
FF FF FF FF AB C7 4D 2D B0 48 31 34 32 32 34 32    每每每每芦脟M-掳H142242
31 30 31 39 33 32 31 00 00 00 00 00 00 00 00 00    1019321.........
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
00 00 00 00 00 00 4E 56 43 42 1C 00 4D 31 00 00    ......NVCB..M1..
FF FF FF FF FF FF FF FF FF FF FF FF AB C7 4D 2D    每每每每每每每每每每每每芦脟M-
B0 48 00 00 00 00 00 00 00 00 00 00 00 00 00 00    掳H..............
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 D5    ...............脮

------------------
I2C-0-0x56:
01 00 FF 00 3C 0F 00 00 01 42 00 00 00 00 00 00    ..每.<....B......
00 00 00 00 36 39 39 2D 31 33 39 30 30 2D 30 30    ....699-13900-00
30 30 2D 31 30 31 20 42 2E 30 00 00 00 00 00 00    00-101 B.0......
00 00 FF FF FF FF FF FF FF FF FF FF FF FF FF FF    ..每每每每每每每每每每每每每每
FF FF FF FF FF FF FF FF FF FF 31 36 31 30 34 32    每每每每每每每每每每161042
32 36 31 30 30 31 32 00 00 00 00 00 00 00 00 00    2610012.........
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
00 00 00 00 00 00 46 46 46 46 FF FF 46 46 FF FF    ......FFFF每每FF每每
FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF    每每每每每每每每每每每每每每每每
FF FF 00 00 00 00 00 00 00 00 00 00 00 00 00 00    每每..............
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 33    ...............3

------------------
I2C-1:
FAIL
------------------
I2C-2:
FAIL
------------------
I2C-7:
FAIL

Log from jtop 4.2.4
rbonghi commented 10 months ago

Excellent, thank you! I would like to figure out where is locate the GPU. Can you share the output from this script? This script (I hope) helps me to understand and fix jtop :-)

import os
igpu_path = "/sys/class/devfreq/"

for item in os.listdir(igpu_path):
    item_path = os.path.join(igpu_path, item)
    if os.path.isfile(item_path) or os.path.islink(item_path):
        # Check name device
        name_path = "{item}/device/of_node/name".format(item=item_path)
        if os.path.isfile(name_path):
            # Decode name
            with open(name_path, 'r') as f:
                name = f.readline().rstrip('\x00')
            # path and file
            print("Path: {}".format(name_path))
            print("{}".format(name))

Thank you in advance

engrjislam commented 10 months ago

Hi @rbonghi,

Sorry for the delay! I found the following output for the above code:

GPU path