intel / intel-device-plugins-for-kubernetes

Collection of Intel device plugins for Kubernetes
Apache License 2.0
35 stars 203 forks source link

Gpu levelzero sidecar #1803

Closed tkatila closed 2 weeks ago

tkatila commented 1 month ago

Add GPU Levelzero sidecar to allow fetching health data for the GPU devices from the Levelzero API.

As a bonus, this also adds support for detecting Intel GPUs within Windows Subsystem for Linux (WSL).

tkatila commented 4 weeks ago

@eero-t & @uniemimu can you please review?

This probably requires some restructuring after the GPU CDI support has been merged (assuming it will happen first).

(Failed check is due to internal certificate issue, I do not suspect it to be a real problem.)

eero-t commented 4 weeks ago

@eero-t & @uniemimu can you please review?

What are the differences to earlier version? (you did force push, so github does not show those.)

tkatila commented 4 weeks ago

@eero-t & @uniemimu can you please review?

What are the differences to earlier version? (you did force push, so github does not show those.)

https://github.com/intel/intel-device-plugins-for-kubernetes/compare/82a387760e4c2a9a12cc366405e19153f67e7efd..76b1e5a04abc600ec1992a1adf0474d42ffad1b9

~mid third is my changes. Top third and bottom third are baseline changes. But to summarize: 1) Some logging to C code 2) Added temperature read from device and its handling in the plugin

tkatila commented 3 weeks ago

Rebased on top of the CDI changes. Also added an e2e test for the levelzero deployment.

mythi commented 3 weeks ago

should it be mentioned somewhere that these new features are not available for the operator users?

tkatila commented 3 weeks ago

should it be mentioned somewhere that these new features are not available for the operator users?

Sure, I'll add a note. This PR is large enough as it is so I didn't want to touch the operator use case. Once this is merged, I'll tackle the operator support.

tkatila commented 3 weeks ago

Rebased the content. Added C fixes, fixed compile flags for the build and decided to change the levelzero enabling to a boolean (instead of a unis socket path).

tkatila commented 2 weeks ago

Since I broke wsl in the previous commits, I verified it to work on the current HEAD.

tkatila commented 2 weeks ago

Squashed commits, no code changes.