CUDA failure 100: no CUDA-capable device is detected ; error when inferencing on a GPUVM

Description I built onnxruntime (v1.8.2) from source following these instructions. Used the following command to build and install

./build.sh --cuda_home /usr/local/cuda-11.7 --cudnn_home /usr/lib/x86_64-linux-gnu/ --use_cuda --config RelWithDebInfo --build_shared_lib --build_wheel --skip_tests --parallel 6
cd build/Linux/RelWithDebInfo
make install

I'm using linear-regression.onnx model, downloaded from here, to perform inferencing in C++ on a GPU.

System information

OS Platform and Distribution: Ubuntu 18.04.6 LTS x86_64
ONNX Runtime installed from (source or binary): source (using the link mentioned above)
G++ Compiler version: 7.5.0

To Reproduce When I compile the following piece of code with g++ -o run1 inference.cpp -I/usr/local/include/onnxruntime/core/session/ -lonnxruntime

// inference.cpp
#include<onnxruntime_cxx_api.h>

#include<iostream>
#include<string>
#include<vector>
#include<numeric>

using namespace std;

template<class T>
T vectorProduct(vector<T>& v)
{
        return accumulate(v.begin(), v.end(), 1, multiplies<T>());
}

int main()
{
        string modelFilepath{"linear-regression.onnx"};
        Ort::Env env;
        Ort::SessionOptions sessionOptions;

        OrtCUDAProviderOptions cuda_options{0};
        sessionOptions.AppendExecutionProvider_CUDA(cuda_options);

        Ort::Session session(env, modelFilepath.c_str(), sessionOptions);
        Ort::AllocatorWithDefaultOptions allocator;

        vector<float> inpVec{1, 2, 3};

        const char* inputName = session.GetInputName(0, allocator);
        vector<int64_t> inputDims{3};
        const char* outputName = session.GetOutputName(0, allocator);
        vector<int64_t> outputDims{1};

        size_t inputTensorSize = vectorProduct(inputDims);
        vector<const char*> inputNames{inputName};
        vector<Ort::Value> inputTensors;

        size_t outputTensorSize = vectorProduct(outputDims);
        vector<float> outputTensorValues(outputTensorSize);
        vector<const char*> outputNames{outputName};
        vector<Ort::Value> outputTensors;

        Ort::MemoryInfo memoryInfo = Ort::MemoryInfo::CreateCpu(OrtAllocatorType::OrtArenaAllocator, OrtMemType::OrtMemTypeDefault);
        inputTensors.push_back(Ort::Value::CreateTensor<float>(memoryInfo, inpVec.data(), inputTensorSize, inputDims.data(), inputDims.size()));
        outputTensors.push_back(Ort::Value::CreateTensor<float>(memoryInfo, outputTensorValues.data(), outputTensorSize, outputDims.data(), outputDims.size()));

        session.Run(Ort::RunOptions{nullptr}, inputNames.data(), inputTensors.data(), 1, outputNames.data(), outputTensors.data(), 1);

        return 0;
}

I don't get any compilation error.

But when I execute ./run1, after successful compilation, I get on_GPU_error

The code runs fine on CPU (i.e. without the lines OrtCUDAProviderOptions cuda_options{0}; sessionOptions.AppendExecutionProvider_CUDA(cuda_options);) The issue is while inferencing on GPU. Any suggestion would be helpful.!

microsoft / onnxruntime

CUDA failure 100: no CUDA-capable device is detected ; error when inferencing on a GPUVM #11561