yalue / onnxruntime_go

A Go (golang) library wrapping microsoft/onnxruntime.
MIT License
186 stars 33 forks source link

CUDA not used #70

Open webfrank opened 1 week ago

webfrank commented 1 week ago

Hi, great works, flawless integration with Go.

I was trying to move inference on CUDA device. This is the code used to initialize the runtime:

func InitYolo8Session(input []float32) (ModelSession, error) {
    lib := getSharedLibPath()
    log.Printf("Loading ONNX runtime %s\n", lib)

    ort.SetSharedLibraryPath(lib)
    err := ort.InitializeEnvironment()
    if err != nil {
        return ModelSession{}, err
    }

    inputShape := ort.NewShape(1, 3, 640, 640)
    inputTensor, err := ort.NewTensor(inputShape, input)
    if err != nil {
        return ModelSession{}, err
    }

    outputShape := ort.NewShape(1, int64(len(yolo_classes)+4), 8400)
    outputTensor, err := ort.NewEmptyTensor[float32](outputShape)
    if err != nil {
        return ModelSession{}, err
    }

    options, e := ort.NewSessionOptions()
    if e != nil {
        return ModelSession{}, err
    }

    if UseCoreML { // If CoreML is enabled, append the CoreML execution provider
        e = options.AppendExecutionProviderCoreML(0)
        if e != nil {
            options.Destroy()
            return ModelSession{}, err
        }
        defer options.Destroy()
    }

    if UseCUDA { // If CUDA is enabled, append the CUDA execution provider
        cudaOptions, err := ort.NewCUDAProviderOptions()
        if err != nil {
            return ModelSession{}, err
        }
        defer cudaOptions.Destroy()

        err = cudaOptions.Update(map[string]string{"device_id": "0"})
        if err != nil {
            return ModelSession{}, err
        }

        e = options.AppendExecutionProviderCUDA(cudaOptions)
        if e != nil {
            options.Destroy()
            return ModelSession{}, err
        }
        defer options.Destroy()
    }

    session, err := ort.NewAdvancedSession(
        ModelPath,
        []string{"images"},
        []string{"output0"},
        []ort.ArbitraryTensor{inputTensor},
        []ort.ArbitraryTensor{outputTensor},
        options,
    )

    if err != nil {
        return ModelSession{}, err
    }

    modelSes := ModelSession{
        Session: session,
        Input:   inputTensor,
        Output:  outputTensor,
    }

    log.Printf("ONNX runtime (%s) initialized [%v]", ort.GetVersion(), ort.IsInitialized())
    return modelSes, err
}

The library is the latest runtime (1.19.2) from official repo, the GPU variant.

Inference is working but with similar time of CPU, the output from nvidia-smi is this:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.08             Driver Version: 535.161.08   CUDA Version: 12.6     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       On  | 00000000:00:1E.0 Off |                    0 |
| N/A   36C    P0              32W /  70W |   1841MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+
yalue commented 4 days ago

Thanks, and I'm glad it's working, at least somewhat!

I would expect any of the initialization functions to return an error if CUDA was not actually initialized correctly... odd. Have you verified that the library runs correctly by running go test -v -bench=. from its source directory? You'd need to set the ONNXRUNTIME_SHARED_LIBRARY_PATH environment variable to point to your GPU-enabled copy of onnxruntime.so in order to run the test, but this should give you good information on whether CUDA is enabled and working properly. (You'd specifically want to look at the output for the BenchmarkCUDASession output, and make sure it's faster than the BenchmarkOpMultiThreaded output.)

Depending on the size of the yolov8 network, it's possible that it's just not large enough to see a significant benefit from CUDA, especially with CUDA's higher overheads. However, it is indeed puzzling that nvidia-smi isn't showing anything. I've seen the current version of onnxruntime_go interact correctly with CUDA on several different systems, so I wonder if you're somehow just loading a wrong copy of the library? Let me know if the tests pass.

And sorry for the slow update, I haven't had much time to look at this project recently.

webfrank commented 2 days ago

Hi. sorry for late reply. I managed to use CUDA upgrading bindings and library to latest versions. Inference time is about 10ms on a AWS Tesla T4 but from nvidia-smi there are no GPU bound processes. If I disable CUDA provider, same hardware, I got 120ms inference time so I suppose it is using GPU but no evidence.