xpu-smi and PyTorch on GPUs

intel / xpumanager

MIT License

87 stars 18 forks source link

xpu-smi and PyTorch on GPUs #71

Open sramakintel opened 7 months ago

sramakintel commented 7 months ago

On NVIDIA GPUs, there is a relation between nvidia-smi and PyTorch, nvidia-smi, which is similar to xpu-smi is used to detect and monitor GPU telemetry. However, absence of nvidia-smi on the host, makes torch.cuda.is_available as False. However, for Intel GPUs, there seems to be no relation between PyTorch GPU support and xpu-smi. PyTorch detects xpu (via ipex.xpu.is_available()) as True even when xpu-smi is not installed.

Is this integrated or am I missing something?

fmiao2372 commented 6 months ago

To my knowledge, the function ipex.xpu.is-available() doesn't detect the existence of xpu-smi currently. However, we can submit the requirement to IPEX team for the consistency with NV if necessary.

fmiao2372 commented 6 months ago

Reference: https://pytorch.org/docs/stable/_modules/torch/cuda.html#is_available

def is_available() -> bool:
    r"""Return a bool indicating if CUDA is currently available."""
    if not _is_compiled():
        return False
    if _nvml_based_avail():
        # The user has set an env variable to request this availability check that attempts to avoid fork poisoning by
        # using NVML at the cost of a weaker CUDA availability assessment. Note that if NVML discovery/initialization
        # fails, this assessment falls back to the default CUDA Runtime API assessment (`cudaGetDeviceCount`)
        return device_count() > 0
    else:
        # The default availability inspection never throws and returns 0 if the driver is missing or can't
        # be initialized. This uses the CUDA Runtime API `cudaGetDeviceCount` which in turn initializes the CUDA Driver
        # API via `cuInit`
        return torch._C._cuda_getDeviceCount() > 0