tkestack / vcuda-controller

Other
488 stars 156 forks source link

compact with the function select by cuGetProcAddress #32

Open VincentLeeMax opened 1 year ago

VincentLeeMax commented 1 year ago

when cuda==11.3, running whith https://github.com/NVIDIA/cuda-samples/tree/v11.3/Samples/reduction, cudaMalloc will meet cudaErrorDeviceUninitialized error. Update the corresponding cuda_library_entry function to the function returned by cuGetProcAddress will fix it.

image
mYmNeo commented 1 year ago

The problem may be the missing entry cuGetProcAddress of cuda_entry_enum_t in cuda-helper.h. The load_necessary_data didn't load real cuGetProcAddress, so caused the panic in #20.

VincentLeeMax commented 1 year ago

The problem may be the missing entry cuGetProcAddress of cuda_entry_enum_t in cuda-helper.h. The load_necessary_data didn't load real cuGetProcAddress, so caused the panic in #20.

Thanks for replaying.

I think it's not the same problem with #20. Calling cudaMalloc still got cudaErrorDeviceUninitialized after initialization, see reduction.log(commit 72e0115d5884f22469de857271c002c84c0d0543).

After adding some log to cuGetProcAddress, I found that cuGetProcAddress return a different cuda version(3020) cudaMalloc comparing with the one stored in cuda_library_entry(2000).

CUresult cuGetProcAddress(const char *symbol, void **pfn, int cudaVersion,
                          cuuint64_t flags) {
  CUresult ret;
  int i;

  load_necessary_data();
  if (!is_custom_config_path()) {
    pthread_once(&g_register_set, register_to_remote);
  }
  pthread_once(&g_init_set, initialization);
  if (!strcmp(symbol, "cuMemAlloc")) {
      LOGGER(1, "%s call version: %d.", symbol, cudaVersion);
      ret = CUDA_ENTRY_CALL(cuda_library_entry, cuGetProcAddress, symbol, pfn,
                            3020, flags);
      LOGGER(1, "cudaVersion 3020, cudaMalloc function ptr: %d.", *pfn);
      ret = CUDA_ENTRY_CALL(cuda_library_entry, cuGetProcAddress, symbol, pfn,
                            2000, flags);
      LOGGER(1, "cudaVersion 2000, cudaMalloc function ptr: %d.", *pfn);
      entry_t entry = cuda_library_entry[CUDA_ENTRY_ENUM(cuMemAlloc)];
      LOGGER(1, "in cuda_library_entry, %s function ptr: %d.", entry.name, entry.fn_ptr);
  }
  ret = CUDA_ENTRY_CALL(cuda_library_entry, cuGetProcAddress, symbol, pfn,
                        cudaVersion, flags);
  if (ret == CUDA_SUCCESS) {
    for (i = 0; i < cuda_hook_nums; i++) {
      if (!strcmp(symbol, cuda_hooks_entry[i].name)) {
        LOGGER(5, "Match hook %s", symbol);
        LOGGER(1, "%s call version: %d.", symbol, cudaVersion);
        *pfn = cuda_hooks_entry[i].fn_ptr;
        break;
      }
    }
  }

  return ret;
}
image

Since we redirect the function in cuda_hooks_entry and cuGetProcAddress may request a function of different cuda version, eg cuLaunchKernel, we should update the cuda_library_entry for the next call in the redirect function.

image
hzliangbin commented 1 year ago

@VincentLeeMax @mYmNeo According to nvdia docs,

  • The base name of the driver API function to look for. As an example, for the driver API cuMemAlloc_v2, symbol would be cuMemAlloc and cudaVersion would be the ABI compatible CUDA version for the _v2 variant.

cuGetProcAddress should add some logic to deal with version, if version comes with 3020, it should selecte cuMemAlloc_v2 instead cuMemAlloc, although symbol is still the cuMemAlloc.