Closed tkatila closed 2 weeks ago
@eero-t & @uniemimu can you please review?
This probably requires some restructuring after the GPU CDI support has been merged (assuming it will happen first).
(Failed check is due to internal certificate issue, I do not suspect it to be a real problem.)
@eero-t & @uniemimu can you please review?
What are the differences to earlier version? (you did force push, so github does not show those.)
@eero-t & @uniemimu can you please review?
What are the differences to earlier version? (you did force push, so github does not show those.)
~mid third is my changes. Top third and bottom third are baseline changes. But to summarize: 1) Some logging to C code 2) Added temperature read from device and its handling in the plugin
Rebased on top of the CDI changes. Also added an e2e test for the levelzero deployment.
should it be mentioned somewhere that these new features are not available for the operator users?
should it be mentioned somewhere that these new features are not available for the operator users?
Sure, I'll add a note. This PR is large enough as it is so I didn't want to touch the operator use case. Once this is merged, I'll tackle the operator support.
Rebased the content. Added C fixes, fixed compile flags for the build and decided to change the levelzero enabling to a boolean (instead of a unis socket path).
Since I broke wsl in the previous commits, I verified it to work on the current HEAD.
Squashed commits, no code changes.
Add GPU Levelzero sidecar to allow fetching health data for the GPU devices from the Levelzero API.
As a bonus, this also adds support for detecting Intel GPUs within Windows Subsystem for Linux (WSL).