microsoft / superbenchmark

A validation and profiling tool for AI infrastructure
https://aka.ms/superbench
MIT License
248 stars 55 forks source link

Monitor - Upgrade pyrsmi to amdsmi python library. #601

Closed guoshzhao closed 8 months ago

guoshzhao commented 8 months ago

Description Upgrade to amdsmi python library since pyrsmi will be retired as AMD guys suggested:

AMD SMI Python Library: https://github.com/ROCm/amdsmi/tree/develop/py-interface pyrsmi: https://github.com/RadeonOpenCompute/pyrsmi

codecov[bot] commented 8 months ago

Codecov Report

Attention: 47 lines in your changes are missing coverage. Please review.

Comparison is base (6e50f02) 86.12% compared to head (754dcea) 85.78%.

Files Patch % Lines
superbench/common/utils/device_manager.py 2.08% 47 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## release/0.10 #601 +/- ## ================================================ - Coverage 86.12% 85.78% -0.35% ================================================ Files 97 97 Lines 6878 6902 +24 ================================================ - Hits 5924 5921 -3 - Misses 954 981 +27 ``` | [Flag](https://app.codecov.io/gh/microsoft/superbenchmark/pull/601/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=microsoft) | Coverage Δ | | |---|---|---| | [cpu-python3.6-unit-test](https://app.codecov.io/gh/microsoft/superbenchmark/pull/601/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=microsoft) | `71.59% <0.00%> (-0.26%)` | :arrow_down: | | [cpu-python3.7-unit-test](https://app.codecov.io/gh/microsoft/superbenchmark/pull/601/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=microsoft) | `71.59% <0.00%> (-0.26%)` | :arrow_down: | | [cpu-python3.8-unit-test](https://app.codecov.io/gh/microsoft/superbenchmark/pull/601/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=microsoft) | `72.01% <0.00%> (-0.26%)` | :arrow_down: | | [cuda-unit-test](https://app.codecov.io/gh/microsoft/superbenchmark/pull/601/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=microsoft) | `83.86% <0.00%> (-0.30%)` | :arrow_down: | | [directx-unit-test](https://app.codecov.io/gh/microsoft/superbenchmark/pull/601/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=microsoft) | `34.57% <2.08%> (-0.72%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=microsoft#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

yukirora commented 8 months ago

hi @guoshzhao, pls check these error msg from MI300 image

guoshzhao commented 8 months ago

hi @guoshzhao, pls check these error msg from MI300 image

Thanks, just checked that GPU utilization and temperature APIs can work on MI250. Looks not supported on MI300. For GPU memory API, I have fixed it. For ECC API, the errors are expected, I have changed the log level to 'info'. Besides, I have change all other log level from 'error' to 'warning' to avoid the misunderstanding when incompitibility happens.

yukirora commented 8 months ago

can we change the warning to only output once for each benchmark, there's too many warnings in the log by this