gpgpu-sim / gpgpu-sim_distribution

GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as well as a performance visualization tool, AerialVisoin, and an integrated energy model, GPUWattch.
Other
1.13k stars 511 forks source link

Does gpgpu-sim support CUDA driver api? #288

Closed BHbean closed 9 months ago

BHbean commented 11 months ago

Hello guys, recently we are trying to use gpgpu-sim to simulate some CUDA programs. However, the programs are written by calling CUDA driver API instead of CUDA runtime API. The vec-add.cu for test is listed below:

#include <iostream>
#include <cuda.h>
#include <cuda_runtime.h>

__global__ void vectorAdd(int* a, int* c, int size) {
    int tid = blockIdx.x * blockDim.x + threadIdx.x;
    if (tid < size) {
        c[tid] = a[tid] * 2;
    }
}

int main() {
    int size = 5;
    int* a, * c;
    int* d_a, * d_c;

    // Allocate memory on the host
    a = new int[size];
    c = new int[size];

    // Initialize input vectors
    for (int i = 0; i < size; i++) {
        a[i] = i;
    }

    // Allocate memory on the device
    cudaMalloc((void**)&d_a, size * sizeof(int));
    cudaMalloc((void**)&d_c, size * sizeof(int));

    // Copy input vectors from host to device
    cudaMemcpy(d_a, a, size * sizeof(int), cudaMemcpyHostToDevice);

    // Launch kernel on the device
    vectorAdd<<<1, 16>>>(d_a, d_c, size);

    // Copy result vector from device to host (using driver api)
    cuMemcpyDtoH(c, reinterpret_cast<CUdeviceptr>(d_c), 1 * sizeof(int));
    std::cout << "c[0]: " << c[0] << std::endl;

    // Copy result vector from device to host (using runtime api)
    cudaMemcpy(c + 1, d_c + 1, 1 * sizeof(int), cudaMemcpyDeviceToHost);
    std::cout << "c[1]: " << c[1] << std::endl;

    // Free memory on the device
    cudaFree(d_a);
    cudaFree(d_c);

    // Free memory on the host
    delete[] a;
    delete[] c;

    return 0;
}

The test code calls cuMemcpyDtoH (which is CUDA driver API) to copy the first element from device to host, while calling cudaMemcpy (which is CUDA runtime API) to copy the first element from device to host. And we use the following command to compile it into executable:

$ nvcc -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64/stubs -I/usr/local/cuda/include ./vecadd.cu -o vecAdd -lcuda -lcudart

when running on gpgpu-sim, it turns out the driver function had not been executed at all:

...

----------------------------END-of-Interconnect-DETAILS-------------------------

gpgpu_simulation_time = 0 days, 0 hrs, 0 min, 1 sec (1 sec)
gpgpu_simulation_rate = 195 (inst/sec)
gpgpu_simulation_rate = 447 (cycle/sec)
gpgpu_silicon_slowdown = 2684563x
c[0]: -98807986               // <== random value here
c[1]: 2
GPGPU-Sim: *** exit detected ***

while the executable can be correctly run on a real GPU.

However, in file libcuda/cuda_api.h, we can indeed see some interfaces begins with prefix cu, revealing that they are CUDA driver API.

So does gpgpu-sim support CUDA driver API? Does it just ignore the CUDA driver API it detected?

quadpixels commented 9 months ago

I believe your question is two fold:

  1. How to build gpgpusim's libcuda.so.1
  2. Whether Driver APIs are supported by gpgpusim
    • (the answer seems to be no)

To do 1, you need to add the following lines in libcuda/Makefile

Before

#--- Make rules ---
lib$(PROG).a: $(OBJS)
    echo $(OBJS)
    ar rcs $(OUTPUT_DIR)/lib$(PROG).a $(OBJS)

After

#--- Make rules ---
all: lib$(PROG).a lib$(PROG).so

lib$(PROG).a: $(OBJS)
    echo $(OBJS)
    ar rcs $(OUTPUT_DIR)/lib$(PROG).a $(OBJS)

lib$(PROG).so: $(OBJS)
    $(CPP) --shared -o $(OUTPUT_DIR)/lib$(PROG).so $(OBJS)

Then create a symbolic link from gpgpusim-generated libcuda.so to libcuda.so.1


When running your above program, you will see the following outputs:

WARNING: this function has not been implemented yet

This is shown in cuda_runtime_api.cc meaning the driver APIs are not supported yet

BHbean commented 9 months ago

That is a really detailed explanation! Thanks for your reply!

I will try that later and close this issue. Thanks again!