rbonghi / jetson_stats

📊 Simple package for monitoring and control your NVIDIA Jetson [Orin, Xavier, Nano, TX] series
https://rnext.it/jetson_stats
GNU Affero General Public License v3.0
2.14k stars 260 forks source link

[FEA] Add Support to DriveOS from the hardware NVIDIA Drive AGX Series. #396

Open ZhenshengLee opened 1 year ago

ZhenshengLee commented 1 year ago

This issue is like #291 to support another tegra based platform from Nvidia.

NVIDIA Jetson AGX is a tegra based system with Jetpack software components and a collection of hardware engines to support robotics and intelligent machines. The latest of production series includes xaiver and orin.

Nvidia Drive AGX is also a tegra based system with DriveOS software components and a collection of hardware engines to support AV-domain applications typically ADAS. The latest of production series includes xaiver and orin too.

So, if we add support to DriveOS, the path should be clear, because the arch will be the same.

The DriveSDK doc is here https://developer.nvidia.com/docs/drive/drive-os/6.0.6/public/drive-os-linux-sdk/common/topics/archi/PlatformSoftwareStacks1.html

Thanks.

rbonghi commented 1 year ago

Hi @ZhenshengLee ,

Have you tried to install jetson-stats on your Drive AGX? I don't have a drive AGX, but if you can help me, I can fix all bugs and make them available also on your device.

When you install, please share the output from the following:

sudo pip3 install --no-cache-dir -U jetson-stats
journalctl -u jtop.service -n 100 --no-pager
jetson_release -v

Also, all errors when you run

jtop

And if you can attach the output from

jtop --error-log
ZhenshengLee commented 1 year ago
sudo -H python3 -m pip install jetson-stats --trusted-host pypi.tuna.tsinghua.edu.cn
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
/usr/share/python-wheels/urllib3-1.25.8-py2.py3-none-any.whl/urllib3/connectionpool.py:999: InsecureRequestWarning: Unverified HTTPS request is being made to host 'pypi.tuna.tsinghua.edu.cn'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
Collecting jetson-stats
/usr/share/python-wheels/urllib3-1.25.8-py2.py3-none-any.whl/urllib3/connectionpool.py:999: InsecureRequestWarning: Unverified HTTPS request is being made to host 'pypi.tuna.tsinghua.edu.cn'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/aa/07/098bfb6f864b44c12957be8798c34af4faabac33ffde2eaf1ef861f901e5/jetson-stats-4.2.1.tar.gz (115 kB)
     |████████████████████████████████| 115 kB 1.1 MB/s 
/usr/share/python-wheels/urllib3-1.25.8-py2.py3-none-any.whl/urllib3/connectionpool.py:999: InsecureRequestWarning: Unverified HTTPS request is being made to host 'pypi.tuna.tsinghua.edu.cn'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
Collecting distro
/usr/share/python-wheels/urllib3-1.25.8-py2.py3-none-any.whl/urllib3/connectionpool.py:999: InsecureRequestWarning: Unverified HTTPS request is being made to host 'pypi.tuna.tsinghua.edu.cn'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/f4/2c/c90a3adaf0ddb70afe193f5ebfb539612af57cffe677c3126be533df3098/distro-1.8.0-py3-none-any.whl (20 kB)
/usr/share/python-wheels/urllib3-1.25.8-py2.py3-none-any.whl/urllib3/connectionpool.py:999: InsecureRequestWarning: Unverified HTTPS request is being made to host 'pypi.tuna.tsinghua.edu.cn'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
Collecting smbus2
/usr/share/python-wheels/urllib3-1.25.8-py2.py3-none-any.whl/urllib3/connectionpool.py:999: InsecureRequestWarning: Unverified HTTPS request is being made to host 'pypi.tuna.tsinghua.edu.cn'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/71/2f/73aad66cdee8d4b94068bbc80164aa6a24b3f83541de7b04974438fd70e6/smbus2-0.4.2-py2.py3-none-any.whl (11 kB)
Building wheels for collected packages: jetson-stats
  Building wheel for jetson-stats (setup.py) ... done
  Created wheel for jetson-stats: filename=jetson_stats-4.2.1-py3-none-any.whl size=154248 sha256=dc59367ccb9b72f151becdbdf09b98ac82f5d8b74acb6005db09b2afe129694f
  Stored in directory: /root/.cache/pip/wheels/34/94/f9/f793ca28f9fbf59de7db1adf3e325e74c1705fda83cce398af
Successfully built jetson-stats
Installing collected packages: distro, smbus2, jetson-stats
Successfully installed distro-1.8.0 jetson-stats-4.2.1 smbus2-0.4.2
journalctl -u jtop.service -n 100 --no-pager
-- Logs begin at Sun 2023-01-29 08:42:46 UTC, end at Sun 2023-02-12 17:23:13 UTC. --
Feb 12 17:22:08 tegra-ubuntu systemd[1]: Started jtop service.
Feb 12 17:22:08 tegra-ubuntu systemd[23443]: jtop.service: Failed to execute command: No such file or directory
Feb 12 17:22:08 tegra-ubuntu systemd[23443]: jtop.service: Failed at step EXEC spawning /usr/local/bin/jtop: No such file or directory
Feb 12 17:22:08 tegra-ubuntu systemd[1]: jtop.service: Main process exited, code=exited, status=203/EXEC
Feb 12 17:22:08 tegra-ubuntu systemd[1]: jtop.service: Failed with result 'exit-code'.
Feb 12 17:22:18 tegra-ubuntu systemd[1]: jtop.service: Scheduled restart job, restart counter is at 1.
Feb 12 17:22:18 tegra-ubuntu systemd[1]: Stopped jtop service.
Feb 12 17:22:18 tegra-ubuntu systemd[1]: Started jtop service.
Feb 12 17:22:18 tegra-ubuntu jtop[23459]: [INFO] jtop.service - jetson_stats 4.2.1 - server loaded
Feb 12 17:22:18 tegra-ubuntu jtop[23459]: [INFO] jtop.core.config - Build service folder in /usr/local/jtop
Feb 12 17:22:18 tegra-ubuntu jtop[23459]: [INFO] jtop.core.hardware - Hardware detected aarch64
Feb 12 17:22:18 tegra-ubuntu jtop[23459]: [INFO] jtop.service - Running on Python: 3.8.10
Feb 12 17:22:18 tegra-ubuntu jtop[23459]: [INFO] jtop.core.cpu - Found 12 CPU
Feb 12 17:22:18 tegra-ubuntu jtop[23459]: [WARNING] jtop.core.gpu - No NVIDIA GPU available
Feb 12 17:22:18 tegra-ubuntu jtop[23459]: [INFO] jtop.core.processes - Process service started
Feb 12 17:22:18 tegra-ubuntu jtop[23459]: [INFO] jtop.core.memory - Found EMC!
Feb 12 17:22:18 tegra-ubuntu jtop[23459]: [INFO] jtop.core.memory - Memory service started
Feb 12 17:22:18 tegra-ubuntu jtop[23459]: [INFO] jtop.core.engine - Special Engine group found: [dlaX]
Feb 12 17:22:18 tegra-ubuntu jtop[23459]: [INFO] jtop.core.engine - Special Engine group found: [pvaX]
Feb 12 17:22:18 tegra-ubuntu jtop[23459]: [INFO] jtop.core.engine - Engines found: [APE DLA0 DLA1 NVDEC NVENC NVJPG PVA0 SE VIC]
Feb 12 17:22:18 tegra-ubuntu jtop[23459]: [INFO] jtop.core.temperature - Found thermal "CV0" in thermal_zone2
Feb 12 17:22:18 tegra-ubuntu jtop[23459]: [INFO] jtop.core.temperature - Found thermal "CPU" in thermal_zone0
Feb 12 17:22:18 tegra-ubuntu jtop[23459]: [INFO] jtop.core.temperature - Found thermal "EXT0" in thermal_zone9
Feb 12 17:22:18 tegra-ubuntu jtop[23459]: [INFO] jtop.core.temperature - Found thermal "EXT1" in thermal_zone12
Feb 12 17:22:18 tegra-ubuntu jtop[23459]: [INFO] jtop.core.temperature - Found thermal "SOC2" in thermal_zone7
Feb 12 17:22:18 tegra-ubuntu jtop[23459]: [INFO] jtop.core.temperature - Found thermal "EXT00" in thermal_zone10
Feb 12 17:22:18 tegra-ubuntu jtop[23459]: [INFO] jtop.core.temperature - Found thermal "SOC0" in thermal_zone5
Feb 12 17:22:18 tegra-ubuntu jtop[23459]: [INFO] jtop.core.temperature - Found thermal "CV1" in thermal_zone3
Feb 12 17:22:18 tegra-ubuntu jtop[23459]: [INFO] jtop.core.temperature - Found thermal "GPU" in thermal_zone1
Feb 12 17:22:18 tegra-ubuntu jtop[23459]: [INFO] jtop.core.temperature - Found thermal "tj" in thermal_zone8
Feb 12 17:22:18 tegra-ubuntu jtop[23459]: [INFO] jtop.core.temperature - Found thermal "EXT10" in thermal_zone11
Feb 12 17:22:18 tegra-ubuntu jtop[23459]: [INFO] jtop.core.temperature - Found thermal "SOC1" in thermal_zone6
Feb 12 17:22:18 tegra-ubuntu jtop[23459]: [INFO] jtop.core.temperature - Found thermal "CV2" in thermal_zone4
Feb 12 17:22:18 tegra-ubuntu jtop[23459]: [WARNING] jtop.core.power - Power sensors not found!
Feb 12 17:22:18 tegra-ubuntu jtop[23459]: [WARNING] jtop.core.fan - No fan found
Feb 12 17:22:18 tegra-ubuntu jtop[23459]: [WARNING] jtop.core.jetson_clocks - jetson_clocks not available
Feb 12 17:22:18 tegra-ubuntu jtop[23459]: [WARNING] jtop.core.nvpmodel - nvpmodel not available
Feb 12 17:22:18 tegra-ubuntu jtop[23478]: [INFO] jtop.service - Initialization service
Feb 12 17:22:18 tegra-ubuntu jtop[23478]: [INFO] jtop.service - service ready
jetson_release -v
Software part of jetson-stats 4.2.1 - (c) 2023, Raffaello Bonghi
Jetpack missing!
 - Model: p3710-0010
 - L4T: 
Traceback (most recent call last):
  File "/usr/local/bin/jetson_release", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/jtop/jetson_release.py", line 62, in main
    nvpmodel_now = nvpmodel_query()
  File "/usr/local/lib/python3.8/dist-packages/jtop/core/nvpmodel.py", line 74, in nvpmodel_query
    lines = nvpmodel_p(timeout=COMMAND_TIMEOUT)
  File "/usr/local/lib/python3.8/dist-packages/jtop/core/command.py", line 115, in __call__
    raise Command.CommandException('Error process:', self.process.returncode)
jtop.core.command.CommandException: [errno:234] Error process:
jtop
I can't access jtop.service.
Please logout or reboot this board.

jtop-error.log

--------------------- PLATFORM -------------------------
Machine: aarch64
System: Linux
Distribution: Ubuntu 20.04 Focal Fossa
Release: 5.10.120-rt70-tegra
Python: 3.8.10
-------------------- RAW OUTPUT ------------------------
------------------
/etc/nv_tegra_release:
No such file or directory
------------------
/sys/firmware/devicetree/base/model:
p3710-0010
------------------
/proc/device-tree/nvidia,boardids:
No such file or directory
------------------
/proc/device-tree/compatible:
nvidia,tegra234
------------------
/proc/device-tree/nvidia,dtsfilename:
arch/arm64/boot/dts/../../../../../../hardware/nvidia/platform/t23x/automotive/kernel-dts/p3710/tegra234-p3710-0010-a01-linux-driveav-gos.dts
------------------
I2C-0:
FAIL
------------------
I2C-1:
FAIL
------------------
I2C-2:
FAIL
------------------
I2C-7:
FAIL

Log from jtop 4.2.1

@rbonghi this is executed in DriveAGXOrinKit of Version DriveOS6.0.6.

Thanks.

ZhenshengLee commented 1 year ago

friendly ping @rbonghi for updates.

rbonghi commented 8 months ago

Hi @ZhenshengLee

really late reply! Apologize. I am still reviewing your posts and trying to figure out how to fix the issue. Although I don't have a Clara AGX, I can release a beta version for you to test if you are available. Alternatively, you can try a branch that includes all the requested features. Let me know

Raffaello

ZhenshengLee commented 8 months ago

Although I don't have a Clara AGX

To clarify, this issue is about the jetson_states with DriveAGX.

I am still reviewing your posts and trying to figure out how to fix the issue

You can refer to driveos website to get info of the hardware engines of DriveAGX, which provided in the top of this issue.

I can release a beta version for you to test if you are available. Alternatively, you can try a branch that includes all the requested features.

After reading https://rnext.it/jetson_stats/contributing.html , You could provide a feature branch to add support to DriveAGX Tegra, let's just add a new L4T version? (Of course it's up to you the code owner). I'd like to test the branch in the DriveAGX machine.

Or, I can finally fork your jetson_states to a new tool, drive_states?

rbonghi commented 8 months ago

To clarify, this issue is about the jetson_states with DriveAGX.

I apologize, I made a mistake while writing to DriveAGX.

After reading https://rnext.it/jetson_stats/contributing.html , You could provide a feature branch to add support to DriveAGX Tegra, let's just add a new L4T version? (Of course it's up to you the code owner). I'd like to test the branch in the DriveAGX machine.

Or, I can finally fork your jetson_states to a new tool, drive_states?

I would like to integrate this feature on jtop to support DriveAGX officially; you don't need to make a new forked version of jtop.

ZhenshengLee commented 8 months ago

I would like to integrate this feature on jtop to support DriveAGX officially; you don't need to make a new forked version of jtop.

Understood! Waiting for your feature branch.

rbonghi commented 8 months ago

try to install jetson-stats following this page: https://rnext.it/jetson_stats/contributing.html

and switch branch to develop

git clone https://github.com/rbonghi/jetson_stats.git
cd jetson_stats
git checkout develop
sudo pip3 install -v -e .

I think I resolved the first issue on jetson_release -v

rbonghi commented 8 months ago

If you can pull the latest commit on the develop branch and run:

jtop --error-log

Sharing the file jtop-error.log, it will be really helpful to figure out what type of fix jtop requires.

rbonghi commented 8 months ago

If you've issues to update your local version, it's the same for me if you run this python script:

import os
igpu_path = "/sys/class/devfreq/"

for item in os.listdir(igpu_path):
    item_path = os.path.join(igpu_path, item)
    if os.path.isfile(item_path) or os.path.islink(item_path):
        # Check name device
        name_path = "{item}/device/of_node/name".format(item=item_path)
        if os.path.isfile(name_path):
            # Decode name
            with open(name_path, 'r') as f:
                name = f.readline().rstrip('\x00')
            # path and file
            print("Path: {}".format(name_path))
            print("{}".format(name))

This script will help me to figure out where it is located the GPU device on a Drive AGX

ZhenshengLee commented 7 months ago

@rbonghi Thanks for your support!

git clone https://github.com/rbonghi/jetson_stats.git cd jetson_stats git checkout develop sudo pip3 install -v -e .

I've read the develop branch of code, accoding to the information of driveAGX, the following module have conflicts with the jetson series:

Could you add more exception handlings like what you did in 79874bdbbed42cdb3c274ab94a0ba20829dece59 before, so the jetson_stats app will not exit/crash and just print an error tag in the GUI?

rbonghi commented 7 months ago

jtop can automatically detect all engines, fans, and check for jetson_clock or nvpmodel availability. It's not a problem, if some features are not available. About tegrastats jtop doesn't use anymore the data coming from there and directly ready the status from the hardware.

about https://github.com/rbonghi/jetson_stats/commit/79874bdbbed42cdb3c274ab94a0ba20829dece59 I will do extra check on my code!

Thank you for all your support

reymor commented 3 months ago

Hello @rbonghi

How is the state of this implementation? is there something in which I could help?

Thanks in advance

ZhenshengLee commented 3 months ago

How is the state of this implementation? is there something in which I could help?

Unfortunately, we had to deprioritize this effort. It is currently unclear when we will pick up speed again.