intel / compute-runtime

Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver
MIT License
1.1k stars 229 forks source link

Error when retrieve ``zesDeviceGetProperties` on Windows MTL iGPU device #713

Open sgwhat opened 3 months ago

sgwhat commented 3 months ago

Hi all, I encountered an issue where I could not retrieve the device modelName using zesDeviceGetProperties on Windows MTL iGPU (but it works well on Linux Arc770).

Platform: Core Ultra5 iGPU (Arc Graphics) Os: Windows 11 iGPU Driver: 31.0.101.5333

Normally, by initializing the gpu driver with zesInit, I can successfully discover all the driver instances and get the properties of the driver instance (props.modelName).

For example, this code can run correctly on Linux Arc770, where I can get the following output:

discovered 1 Level-Zero drivers
discovered 1 Level-Zero devices
[0] oneAPI device name: Intel(R) Arc(TM) A770 Graphics
[0] oneAPI brand: unknown
[0] oneAPI vendor: Intel(R) Corporation
[0] oneAPI S/N: unknown
[0] oneAPI board number: unknown
discovered 1 Level-Zero memory modules

However, on Windows MTL iGPU, props.modelName is an empty value, and it could not find any Level-Zero memory modules.

discovered 1 Level-Zero drivers
discovered 1 Level-Zero devices
[0] oneAPI device name:
[0] oneAPI brand:
[0] oneAPI vendor:
[0] oneAPI S/N: unknown
[0] oneAPI board number: unknown
discovered 0 Level-Zero memory modules

For more details about my implementation, you may see my below comment.

sgwhat commented 3 months ago

Below is my implementation for initializing the GPU driver:

  // Initialize the gpu driver
  resp->oh.handle = LoadLibrary("C:\Windows\System32\...\ze_loader.dll");

  for (i = 0; l[i].s != NULL; i++) {
    *l[i].p = GetProcAddress(resp->oh.handle, l[i].s);
    if (!l[i].p) {
      resp->oh.handle = NULL;
      char *msg = LOAD_ERR();
      UNLOAD_LIBRARY(resp->oh.handle);
      free(msg);
      resp->err = strdup(buf);
      return;
    }
  }

  ret = (*resp->oh.zesInit)(0);

Below is my implementation for discovering the GPU driver and reach the properties:

  // Discover the gpu driver instance
  void oneapi_check_vram(oneapi_handle_t h, mem_info_t *resp) {
      ze_result_t ret;
      resp->err = NULL;
      resp->igpu_index = -1;
      uint64_t totalMem = 0;
      uint64_t usedMem = 0;
      const int buflen = 256;
      char buf[buflen + 1];
      int i, d, m;
      uint32_t driversCount = 0;
      ret = (*h.zesDriverGet)(&driversCount, NULL);

      zes_driver_handle_t *allDrivers =
          malloc(driversCount * sizeof(zes_driver_handle_t));
      (*h.zesDriverGet)(&driversCount, allDrivers);

      resp->total = 0;
      resp->free = 0;

      for (d = 0; d < driversCount; d++) {
        uint32_t deviceCount = 0;
        ret = (*h.zesDeviceGet)(allDrivers[d], &deviceCount, NULL);

        zes_device_handle_t *devices =
            malloc(deviceCount * sizeof(zes_device_handle_t));
        (*h.zesDeviceGet)(allDrivers[d], &deviceCount, devices);

        for (i = 0; i < deviceCount; ++i) {
          uint32_t globalDeviceIndex = resp->count;
          resp->count++;

          zes_device_ext_properties_t ext_props;
          ext_props.stype = ZES_STRUCTURE_TYPE_DEVICE_EXT_PROPERTIES;
          ext_props.pNext = NULL;

          zes_device_properties_t props;
          props.stype = ZES_STRUCTURE_TYPE_DEVICE_PROPERTIES;
          props.pNext = &ext_props;

          ret = (*h.zesDeviceGetProperties)(devices[i], &props);
          if (ret != ZE_RESULT_SUCCESS) {
            snprintf(buf, buflen, "unable to get device properties: %d", ret);
            resp->err = strdup(buf);
            free(allDrivers);
            free(devices);
            return;
          }

          if (h.verbose) {
            LOG(h.verbose, "[%d] oneAPI device name: %s\n", globalDeviceIndex,
                props.modelName);
            LOG(h.verbose, "[%d] oneAPI brand: %s\n", globalDeviceIndex,
                props.brandName);
            LOG(h.verbose, "[%d] oneAPI vendor: %s\n", globalDeviceIndex,
                props.vendorName);
                LOG(h.verbose, "[%d] oneAPI S/N: %s\n", globalDeviceIndex,
                    props.serialNumber);
                LOG(h.verbose, "[%d] oneAPI board number: %s\n", globalDeviceIndex,
                    props.boardNumber);
          }

          uint32_t memCount = 0;
          ret = (*h.zesDeviceEnumMemoryModules)(devices[i], &memCount, NULL);
          LOG(h.verbose, "discovered %d Level-Zero memory modules\n", memCount);
eero-t commented 3 months ago

What's your MTL model number, and version of compute-runtime include with your Windows driver package (latest compute runtime versions start with 24.)?

PS. I'm compute-runtime Linux user, not its developer, but grepping the model ID from sources is trivial and one can check whether given release tag contains that commit.

saik-intel commented 3 months ago

we will check internally whether we could support the model name for MTL platform and comeback