triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.02k stars 1.44k forks source link

Is inferencing natively with C++ natively supported in Triton For Windows version 2.47 and ONNX backend? (Without GRPC and HTTPs calls. #7446

Open saugatapaul1010 opened 1 month ago

saugatapaul1010 commented 1 month ago

Description Hi, I have setup Triton version 2.47 for Windows, along with ONNX runtime backend, based on the assets for Triton 2.47 that are mentioned in this URL : https://github.com/triton-inference-server/server/releases/

I have downloaded the ZIP file (tritonserver2.47.0-win.zip), and extracted it's contents on my local system.

Triton Information I am using Triton version 2.47, for Windows 10, Python 3.10.11, CUDA 12.4, TensorRT 10.0.1.6, 9.1.0.7, VCPKG and dependent C, C++ development SDKs.

Are you using the Triton container or did you build it yourself? I have downloaded the ZIP file for Windows (Triton 2.47) and extracted it's contents. I am using the steps mentioned in this URL : https://github.com/triton-inference-server/server/releases/tag/v2.47.0

To Reproduce Steps to reproduce the behavior.

First of all I had to spend a huge amount of time to set this thing up natively in Windows 10, because of lack of clear instructions in the readme file or anywhere, as a result of which I had to do a lot of trial and error until I randomly came across a solution, that is manually copying and pasting all the files from ``tritonserver2.47.0-win\tritonserver\backends\onnxruntimetomodels_repository\resnet50folder, where resnet50 folder contain my ONNX model. This was clearly not mentioned anywhere in the documentation. Then I am using this commandtritonserver --model-repository=F:/Triton_Latest/models_repository --backend-config=onnx,dir=F:/Triton_Latest/tools/tritonserver2.47.0-win/tritonserver/backends/onnxruntime,version=1.18.0``` to check if the ONNX runtime has been successfully setup. On successful setup of the triton server and onnx backend, I get this below outputs:

image

So the GRPC and HTTPs services are up and running, however the issue is that we need to natively use the Resnet 50 model using C++ code, and get a response. So is there any C++ wrapper or functionality that is present? Cause this is not mentioned anywhere in the release notes for Triton 2.47 for Windows.

What I want to achieve is this functionality : Loading the triton inference server, loading the resnet 50 model in the server, and making use of this model to provide inference based on a C++ code natively on Windows 10. Now, currently it's mentioned that GRPC and HTTPs calls are supported. What about native runtime inference using C++ ?

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

Below is the model configuration file for reference for ResNet 50.

name: "resnet50"
platform: "onnxruntime_onnx"

input [
  {
    name: "input"
    data_type: TYPE_FP32
    dims: [ 1, 3, 224, 224 ] 
  }
]
output [
  {
    name: "output"
    data_type: TYPE_FP32
    dims: [1, 1000 ] 
  }
]

instance_group {
  kind: KIND_GPU
  count: 2
}

This is my CMakeLists.txt file :

cmake_minimum_required(VERSION 3.10)

# Set your project name and the C++ standard
project(TritonServerCheck VERSION 1.0 LANGUAGES CXX)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED True)

#opencv4 testomg
#find_package(OpenCV REQUIRED)
#include_directories("/usr/include/opencv4/")
#include_directories("/home/Z004HSRM/Downloads/craneAI/library/onnx_old_version/onnxruntime_backend/build/install/include")

include_directories("F:\\Triton_Latest\\tools\\tritonserver2.47.0-win\\tritonserver\\include")
# Define the executable
add_executable(TritonServerCheck prod.cpp)

# Link to the necessary libraries
#find_library(TRITONSERVER_LIBRARY NAMES libtritonserver HINTS "F:\\Triton_Latest\\tools\\tritonserver2.47.0-win\\tritonserver\\include")
#find_library(TRITONONNXRUNTIME_LIBRARY NAMES triton_onnxruntime HINTS /home/Z004HSRM/Downloads/craneAI/library/onnx_old_version/onnxruntime_backend/build/install/backends/onnxruntime)

#target_link_libraries(TritonServerCheck ${TRITONSERVER_LIBRARY} ${TRITONONNXRUNTIME_LIBRARY} dl)
#add_library(TritonServerCheck SHARED test_onnx_extern1.cpp)
#target_link_libraries(TritonServerCheck ${TRITONSERVER_LIBRARY} dl)
#target_link_libraries(TritonServerCheck "C:\\Users\\z004ns2e\\Desktop\\Triton_Latest\\core\\build\\Debug\\tritonserver.dll")

This is my main cpp file (prod.cpp).

#include "triton/core/tritonserver.h"
#include <iostream>
//#include <dlfcn.h>
#include <windows.h>
#include <vector>

//#include <unistd.h>

//#include <opencv2/opencv.hpp>
#include <chrono>
#include <thread>

typedef TRITONSERVER_Error* (*TRITONSERVER_ServerNewFn)(TRITONSERVER_Server**, TRITONSERVER_ServerOptions*);
typedef TRITONSERVER_Error* (*TRITONSERVER_ServerOptionsNewFn)(TRITONSERVER_ServerOptions**);
typedef TRITONSERVER_Error* (*TRITONSERVER_ServerOptionsSetModelRepositoryPathFn)(TRITONSERVER_ServerOptions*, const char*);
typedef TRITONSERVER_Error* (*TRITONSERVER_ServerOptionsSetLogVerboseFn)(TRITONSERVER_ServerOptions*, uint32_t);
typedef TRITONSERVER_Error* (*TRITONSERVER_ServerOptionsSetBackendDirectoryFn)(TRITONSERVER_ServerOptions*, const char*);
typedef TRITONSERVER_Error* (*TRITONSERVER_ServerIsReadyFn)(TRITONSERVER_Server*, bool*);
//typedef TRITONSERVER_Error* (*TRITONSERVER_ServerOptionsGetBackendDirectoryFn)(TRITONSERVER_ServerOptions*, const char**, const char**);

typedef TRITONSERVER_Error* (*TRITONSERVER_ServerModelIndexFn)(TRITONSERVER_Server*, TRITONSERVER_Message**);

typedef TRITONSERVER_Error* (*TRITONSERVER_MessageSerializeToJsonFn)(TRITONSERVER_Message*, const char**, size_t*);

typedef TRITONSERVER_Error* (*TRITONSERVER_ServerOptionsSetModelControlModeFn)(TRITONSERVER_ServerOptions*, TRITONSERVER_ModelControlMode);
typedef TRITONSERVER_Error* (*TRITONSERVER_ServerOptionsSetStrictModelConfigFn)(TRITONSERVER_ServerOptions*, bool);
typedef void (*TRITONSERVER_ErrorDeleteFn)(TRITONSERVER_Error*);
typedef void (*TRITONSERVER_ServerDeleteFn)(TRITONSERVER_Server*);
typedef void (*TRITONSERVER_ServerOptionsDeleteFn)(TRITONSERVER_ServerOptions*);

//request
typedef TRITONSERVER_Error* (*TRITONSERVER_InferenceRequestNewFn)(TRITONSERVER_InferenceRequest**, TRITONSERVER_Server*, const char*, int64_t);

typedef TRITONSERVER_Error* (*TRITONSERVER_InferenceRequestAddInputFn)(TRITONSERVER_InferenceRequest* request, const char* name, TRITONSERVER_DataType datatype, const int64_t* shape, uint32_t dim_count);
typedef TRITONSERVER_Error* (*TRITONSERVER_InferenceRequestAppendInputDataFn)(TRITONSERVER_InferenceRequest* request, const char* name, const void* base, size_t byte_size, TRITONSERVER_MemoryType memory_type, int64_t memory_type_id);

// typedef TRITONSERVER_Error* (*TRITONSERVER_InferenceRequestAddInputFn)(TRITONSERVcmakR_InferenceRequest*, const char*, TRITONSERVER_DataType, const int64_t*, uint64_t);
// typedef TRITONSERVER_Error* (*TRITONSERVER_InferenceRequestAppendInputDataFn)(TRITONSERVER_InferenceRequest*, const void*, size_t, TRITONSERVER_MemoryType, int64_t);
typedef TRITONSERVER_Error* (*TRITONSERVER_InferenceRequestAddRequestedOutputFn)(TRITONSERVER_InferenceRequest* request, const char* name);

typedef TRITONSERVER_Error* (*ModelConfigFn)(TRITONSERVER_Server* server, const char* model_name, uint64_t model_version, TRITONSERVER_Message** model_config);

//response
typedef TRITONSERVER_Error* (*TRITONSERVER_InferenceRequestSetReleaseCallbackFn)(TRITONSERVER_InferenceRequest*, TRITONSERVER_InferenceRequestReleaseFn_t, void*);

//typedef TRITONSERVER_Error* (*TRITONSERVER_InferenceRequestSetResponseCallbackFn)(TRITONSERVER_InferenceRequest*, TRITONSERVER_ResponseAllocator*, void*, TRITONSERVER_InferenceResponseCompleteFn_t, void*);
typedef TRITONSERVER_Error* (*TRITONSERVER_InferenceRequestSetResponseCallbackFn)(TRITONSERVER_InferenceRequest*, TRITONSERVER_ResponseAllocator*, void*, TRITONSERVER_InferenceResponseCompleteFn_t, void*);

//typedef TRITONSERVER_Error* (*TRITONSERVER_ServerInferAsyncFn)(TRITONSERVER_Server* server, TRITONSERVER_InferenceRequest* request, void* trace);
typedef TRITONSERVER_Error* (*TRITONSERVER_ServerInferAsyncFn)(TRITONSERVER_Server*, TRITONSERVER_InferenceRequest*, void*);

typedef TRITONSERVER_Error* (*TRITONSERVER_InferenceRequestDeleteFn)(TRITONSERVER_InferenceRequest* request);
// typedef TRITONSERVER_Error* (*TRITONSERVER_ResponseAllocatorNewFn)(TRITONSERVER_ResponseAllocator** allocator, TRITONSERVER_ResponseAllocatorAllocFn_t alloc_fn, TRITONSERVER_ResponseAllocatorReleaseFn_t release_fn);
typedef TRITONSERVER_Error* (*TRITONSERVER_ResponseAllocatorDeleteFn)(TRITONSERVER_ResponseAllocator* allocator);

typedef TRITONSERVER_Error* (*TRITONSERVER_ResponseAllocatorNewFn)(TRITONSERVER_ResponseAllocator**, TRITONSERVER_ResponseAllocatorAllocFn_t, TRITONSERVER_ResponseAllocatorReleaseFn_t,void *);

//res
typedef TRITONSERVER_Error* (*TRITONSERVER_InferenceResponseOutputFn_t)(
    TRITONSERVER_InferenceResponse* response, uint32_t index,
    const char** name, TRITONSERVER_DataType* datatype,
    const int64_t** shape, uint64_t* dim_count, const void** base,
    size_t* byte_size, TRITONSERVER_MemoryType* memory_type,
    int64_t* memory_type_id, void* userp);

typedef TRITONSERVER_Error* (*TRITONSERVER_InferenceResponseOutputCountFn_t)(
    TRITONSERVER_InferenceResponse* response, uint32_t* count);

typedef TRITONSERVER_Error* (*TRITONSERVER_InferenceResponseOutputCountFn)(TRITONSERVER_InferenceResponse*, uint32_t*);
typedef TRITONSERVER_Error* (*TRITONSERVER_InferenceResponseOutputFn)(TRITONSERVER_InferenceResponse*, const uint32_t, const char**, TRITONSERVER_DataType*, const int64_t**, uint64_t*, const void**, size_t*, TRITONSERVER_MemoryType*, int64_t*, void**);

// Callback functions for response allocation and release
// TRITONSERVER_Error* ResponseAlloc(
//     TRITONSERVER_ResponseAllocator* allocator,
//     const char* tensor_name, size_t byte_size, void* userp,
//     void** buffer, void** buffer_userp, TRITONSERVER_MemoryType* memory_type,
//     int64_t* memory_type_id)
// {
//     *buffer = malloc(byte_size);
//     *buffer_userp = nullptr;
//     *memory_type = TRITONSERVER_MEMORY_CPU;
//     *memory_type_id = 0;
//     return nullptr; // Success
// }

extern "C" {

// void CheckTritonError2(TRITONSERVER_Error* error) {
//     if (error != nullptr) {
//         std::cerr << "Triton Error: " << TRITONSERVER_ErrorMessage(error) << std::endl;
//         exit(1);
//     }
// }

void CheckTritonError(TRITONSERVER_Error* error) {
    if (error != nullptr) {
        //const char* msg = TRITONSERVER_ErrorMessage(error);
        //std::cerr << "Error: " << message << ": " << msg << std::endl;
        std::cerr << "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" << std::endl;
        //TRITONSERVER_ErrorDelete(error);
        exit(1);
    }
    else {
        std::cerr << "CheckStatus is working" << std::endl;
    }
}

TRITONSERVER_Error* ResponseAlloc(
    TRITONSERVER_ResponseAllocator* allocator,
    const char* tensor_name, size_t byte_size, TRITONSERVER_MemoryType memory_type,
    int64_t memory_type_id, void* userp, void** buffer, void** buffer_userp,
    TRITONSERVER_MemoryType* actual_memory_type, int64_t* actual_memory_type_id)
{
    std::cout << "Allocating buffer for tensor: " << tensor_name << " with byte size: " << byte_size << std::endl;
    std::cout << "Requested memory type: " << memory_type << " and memory type id: " << memory_type_id << std::endl;

    *buffer = malloc(byte_size);
    if (*buffer == nullptr) {
        return TRITONSERVER_ErrorNew(TRITONSERVER_ERROR_INTERNAL, "Failed to allocate memory");
    }
    *buffer_userp = nullptr;
    *actual_memory_type = TRITONSERVER_MEMORY_CPU;
    *actual_memory_type_id = 0;

    std::cout << "Memory allocated on CPU" << std::endl;

    return nullptr; // Success
}

// TRITONSERVER_Error* ResponseRelease(
//     TRITONSERVER_ResponseAllocator* allocator, void* buffer, void* buffer_userp)
// {
//     free(buffer);
//     return nullptr; // Success
// }

TRITONSERVER_Error* ResponseRelease(
    TRITONSERVER_ResponseAllocator* allocator, void* buffer, void* buffer_userp,
    size_t byte_size, TRITONSERVER_MemoryType memory_type, int64_t memory_type_id)
{
    std::cout << "Releasing buffer of size: " << byte_size << " from memory type: " << memory_type << std::endl;
    free(buffer);
    return nullptr; // Success
}

void InferResponseComplete(TRITONSERVER_InferenceResponse* response, uint32_t flags, void* userp)
{
    uint32_t output_count;
    CheckTritonError(TRITONSERVER_InferenceResponseOutputCount(response, &output_count));

    for (uint32_t i = 0; i < output_count; ++i) {
        const char* output_name;
        TRITONSERVER_DataType datatype;
        const int64_t* shape;
        uint64_t dim_count;
        const void* base;
        size_t byte_size;
        TRITONSERVER_MemoryType memory_type;
        int64_t memory_type_id;
        void* user_data;

        CheckTritonError(TRITONSERVER_InferenceResponseOutput(
            response, i, &output_name, &datatype, &shape, &dim_count, &base, &byte_size, &memory_type, &memory_type_id, &user_data));

        std::cout << "Output tensor name: " << output_name << std::endl;
        std::cout << "Output data type: " << datatype << std::endl;
        std::cout << "Output dimensions: ";
        for (uint64_t j = 0; j < dim_count; ++j) {
            std::cout << shape[j] << "... ";
        }
        std::cout << std::endl;
        std::cout << "byte_size :"<<byte_size<<std::endl;

        // const float* float_data = reinterpret_cast<const float*>(base);
        // for (size_t j = 0; j < byte_size / sizeof(float); ++j) {
        //     std::cout << float_data[j] << " ";
        // }
        // std::cout << std::endl;
        std::cout << "completeedd..........:"<<std::endl;

    }

    CheckTritonError(TRITONSERVER_InferenceResponseDelete(response));
    bool& inference_completed = *static_cast<bool*>(userp);
    std::cout << "Inference response received." << std::endl;
    inference_completed = true;
}

// void InferResponseComplete12(
//     TRITONSERVER_InferenceResponse* response, uint32_t flags, void* userp)
// {
//     uint32_t output_count;
//     CheckTritonError(TRITONSERVER_InferenceResponseOutputCount(response, &output_count));

//     for (uint32_t i = 0; i < output_count; ++i) {
//         const char* output_name;
//         TRITONSERVER_DataType datatype;
//         const int64_t* shape;
//         uint64_t dim_count;
//         const void* base;
//         size_t byte_size;
//         TRITONSERVER_MemoryType memory_type;
//         int64_t memory_type_id;
//         std::cout << "Output ...................."<<std::endl; 

//         CheckTritonError(TRITONSERVER_InferenceResponseOutput(
//             response, i, &output_name, &datatype, &shape, &dim_count,
//             &base, &byte_size, &memory_type, &memory_type_id, nullptr));

//         std::cout << "Output tensor " << output_name << " has shape [";
//         for (uint64_t j = 0; j < dim_count; ++j) {
//             std::cout << shape[j];
//             if (j < dim_count - 1) {
//                 std::cout << ", ";
//             }
//         }
//         std::cout << "] and size " << byte_size << std::endl;

//         // Print the values if the datatype is float
//         if (datatype == TRITONSERVER_TYPE_FP32) {
//             const float* output_data = static_cast<const float*>(base);
//             size_t element_count = byte_size / sizeof(float);

//             std::cout << "Output values: ";
//             for (size_t j = 0; j < element_count; ++j) {
//                 std::cout << output_data[j];
//                 if (j < element_count - 1) {
//                     std::cout << ", ";
//                 }
//             }
//             std::cout << std::endl;
//         } else {
//             std::cout << "Datatype is not FP32. Printing not supported for this datatype." << std::endl;
//         }
//     }
//      std::cout << "Output ...................."<<std::endl; 
//     TRITONSERVER_InferenceResponseDelete(response);
// }

// void InferResponseComplete1(
//     TRITONSERVER_InferenceResponse* response, uint32_t flags, void* userp)
// {
//     uint32_t output_count;
//     CheckTritonError(TRITONSERVER_InferenceResponseOutputCount(response, &output_count));

//     for (uint32_t i = 0; i < output_count; ++i) {
//         const char* output_name;
//         TRITONSERVER_DataType datatype;
//         const int64_t* shape;
//         uint64_t dim_count;
//         const void* base;
//         size_t byte_size;
//         TRITONSERVER_MemoryType memory_type;
//         int64_t memory_type_id;

//         CheckTritonError(TRITONSERVER_InferenceResponseOutput(
//             response, i, &output_name, &datatype, &shape, &dim_count,
//             &base, &byte_size, &memory_type, &memory_type_id, nullptr));

//         std::cout << "Output tensor " << output_name << " has shape [";
//         for (uint64_t j = 0; j < dim_count; ++j) {
//             std::cout << shape[j];
//             if (j < dim_count - 1) {
//                 std::cout << ", ";
//             }
//         }
//         std::cout << "] and size " << byte_size << std::endl;
//     }

//     TRITONSERVER_InferenceResponseDelete(response);

//         bool& inference_completed = *static_cast<bool*>(userp);
//     std::cout << "Inference response received." << std::endl;
//     inference_completed = true;
// }

//Callback function for request release
void RequestRelease(
    TRITONSERVER_InferenceRequest* request, uint32_t flags, void* userp)
{
    TRITONSERVER_InferenceRequestDelete(request);
}

// void CheckTritonError1(TRITONSERVER_Error* error) {
//     if (error != nullptr) {
//         const char* msg = TRITONSERVER_ErrorMessage(error);
//         std::cerr << "Triton Error: " << msg << std::endl;

//         void* handle = dlopen(nullptr, RTLD_LAZY);
//         if (handle) {
//             TRITONSERVER_ErrorDeleteFn TRITONSERVER_ErrorDelete = (TRITONSERVER_ErrorDeleteFn)GetProcAddress(handle, "TRITONSERVER_ErrorDelete");
//             if (TRITONSERVER_ErrorDelete) {
//                 TRITONSERVER_ErrorDelete(error);
//             }
//             dlclose(handle);
//         }

//         exit(1);
//     }
// }

// Globals
TRITONSERVER_Server* server_ptr = nullptr;
void* triton_handle=nullptr;
void* onnx_handle=nullptr ;

TRITONSERVER_ResponseAllocator* response_allocator = nullptr;
//TRITONSERVER_InferenceRequest* inference_request = nullptr;

extern "C" void InitializeServer(const char* model_name_arg) {

  //  setenv("CUDA_VISIBLE_DEVICES", "", 1);
    const char* model_repository_path = "F:\\Triton_Latest\\models_repository\\";
    const char* onnx_runtime_backend_so = "libtriton_onnxruntime.so";
    // const char* triton_server_so = "libtritonserver.so";

    const char* model_name = "resnet50";
    const char* input_name = "input";
    const char* output_name = "output";

    HMODULE triton_handle = LoadLibrary(TEXT("F:\\Triton_Latest\\tools\\tritonserver2.47.0-win\\tritonserver\\bin\\tritonserver.dll"));
    if (!triton_handle) {
        std::cerr << "Could not load the DLL!" << std::endl;
        //return EXIT_FAILURE; ##TODO
    }
    else {
        std::cerr << "triton_handle loaded" << std::endl;
    }

    HMODULE onnx_handle = LoadLibrary(TEXT("F:\\Triton_Latest\\tools\\tritonserver2.47.0-win\\tritonserver\\backends\\onnxruntime\\onnxruntime.dll"));
    if (!onnx_handle) {
        std::cerr << "Could not load the ONNX!" << std::endl;
        //return EXIT_FAILURE;
    }
    else {
        std::cerr << "ONNX DLL loaded" << std::endl;
    }

    // // Load Triton Inference Server shared object
    // triton_handle = dlopen(triton_server_so, RTLD_LAZY);
    // if (!triton_handle) {
    //     std::cerr << "Cannot load Triton shared object: " << dlerror() << std::endl;
    //     exit(1);
    // }

    // // Load ONNX Runtime backend shared object
    // onnx_handle = dlopen(onnx_runtime_backend_so, RTLD_LAZY);
    // if (!onnx_handle) {
    //     std::cerr << "Cannot load ONNX Runtime backend shared object: " << dlerror() << std::endl;
    //     exit(1);
    // }

    // Load the necessary functions
    TRITONSERVER_ServerOptionsNewFn TRITONSERVER_ServerOptionsNew = (TRITONSERVER_ServerOptionsNewFn)GetProcAddress(triton_handle, "TRITONSERVER_ServerOptionsNew");
    TRITONSERVER_ServerOptionsSetModelRepositoryPathFn TRITONSERVER_ServerOptionsSetModelRepositoryPath = (TRITONSERVER_ServerOptionsSetModelRepositoryPathFn)GetProcAddress(triton_handle, "TRITONSERVER_ServerOptionsSetModelRepositoryPath");
    TRITONSERVER_ServerOptionsSetBackendDirectoryFn TRITONSERVER_ServerOptionsSetBackendDirectory = (TRITONSERVER_ServerOptionsSetBackendDirectoryFn)GetProcAddress(triton_handle, "TRITONSERVER_ServerOptionsSetBackendDirectory");
    //TRITONSERVER_ServerOptionsGetBackendDirectoryFn TRITONSERVER_ServerOptionsGetBackendDirectory = (TRITONSERVER_ServerOptionsGetBackendDirectoryFn)GetProcAddress(triton_handle, "TRITONSERVER_ServerOptionsGetBackendDirectory");
    TRITONSERVER_ServerOptionsSetLogVerboseFn TRITONSERVER_ServerOptionsSetLogVerbose = (TRITONSERVER_ServerOptionsSetLogVerboseFn)GetProcAddress(triton_handle, "TRITONSERVER_ServerOptionsSetLogVerbose");
    TRITONSERVER_ServerOptionsSetModelControlModeFn TRITONSERVER_ServerOptionsSetModelControlMode = (TRITONSERVER_ServerOptionsSetModelControlModeFn)GetProcAddress(triton_handle, "TRITONSERVER_ServerOptionsSetModelControlMode");
    TRITONSERVER_ServerOptionsSetStrictModelConfigFn TRITONSERVER_ServerOptionsSetStrictModelConfig = (TRITONSERVER_ServerOptionsSetStrictModelConfigFn)GetProcAddress(triton_handle, "TRITONSERVER_ServerOptionsSetStrictModelConfig");
    TRITONSERVER_ServerNewFn TRITONSERVER_ServerNew = (TRITONSERVER_ServerNewFn)GetProcAddress(triton_handle, "TRITONSERVER_ServerNew");
    TRITONSERVER_ServerIsReadyFn TRITONSERVER_ServerIsReady = (TRITONSERVER_ServerIsReadyFn)GetProcAddress(triton_handle, "TRITONSERVER_ServerIsReady");
    TRITONSERVER_ServerDeleteFn TRITONSERVER_ServerDelete = (TRITONSERVER_ServerDeleteFn)GetProcAddress(triton_handle, "TRITONSERVER_ServerDelete");
    TRITONSERVER_ServerOptionsDeleteFn TRITONSERVER_ServerOptionsDelete = (TRITONSERVER_ServerOptionsDeleteFn)GetProcAddress(triton_handle, "TRITONSERVER_ServerOptionsDelete");
    TRITONSERVER_MessageSerializeToJsonFn TRITONSERVER_MessageSerializeToJson = (TRITONSERVER_MessageSerializeToJsonFn)GetProcAddress(triton_handle, "TRITONSERVER_MessageSerializeToJson");
    TRITONSERVER_ServerModelIndexFn TRITONSERVER_ServerModelIndex = (TRITONSERVER_ServerModelIndexFn)GetProcAddress(triton_handle, "TRITONSERVER_ServerModelIndex");

//inf req

    TRITONSERVER_InferenceRequestNewFn TRITONSERVER_InferenceRequestNew = (TRITONSERVER_InferenceRequestNewFn)GetProcAddress(triton_handle, "TRITONSERVER_InferenceRequestNew");
    TRITONSERVER_InferenceRequestAddInputFn TRITONSERVER_InferenceRequestAddInput = (TRITONSERVER_InferenceRequestAddInputFn)GetProcAddress(triton_handle, "TRITONSERVER_InferenceRequestAddInput");
    TRITONSERVER_InferenceRequestAppendInputDataFn TRITONSERVER_InferenceRequestAppendInputData = (TRITONSERVER_InferenceRequestAppendInputDataFn)GetProcAddress(triton_handle, "TRITONSERVER_InferenceRequestAppendInputData");
    TRITONSERVER_InferenceRequestAddRequestedOutputFn TRITONSERVER_InferenceRequestAddRequestedOutput = (TRITONSERVER_InferenceRequestAddRequestedOutputFn)GetProcAddress(triton_handle, "TRITONSERVER_InferenceRequestAddRequestedOutput");

  //  TRITONSERVER_ServerInferAsyncFn TRITONSERVER_ServerInferAsync = (TRITONSERVER_ServerInferAsyncFn)GetProcAddress(triton_handle, "TRITONSERVER_ServerInferAsync");

TRITONSERVER_ServerInferAsyncFn TRITONSERVER_ServerInferAsync = (TRITONSERVER_ServerInferAsyncFn)GetProcAddress(triton_handle, "TRITONSERVER_ServerInferAsync");

 TRITONSERVER_InferenceRequestSetReleaseCallbackFn TRITONSERVER_InferenceRequestSetReleaseCallback = (TRITONSERVER_InferenceRequestSetReleaseCallbackFn)GetProcAddress(triton_handle, "TRITONSERVER_InferenceRequestSetReleaseCallback");
 //   TRITONSERVER_InferenceRequestSetResponseCallbackFn TRITONSERVER_InferenceRequestSetResponseCallback = (TRITONSERVER_InferenceRequestSetResponseCallbackFn)GetProcAddress(triton_handle, "TRITONSERVER_InferenceRequestSetResponseCallback");

TRITONSERVER_InferenceRequestDeleteFn TRITONSERVER_InferenceRequestDelete = (TRITONSERVER_InferenceRequestDeleteFn)GetProcAddress(triton_handle, "TRITONSERVER_InferenceRequestDelete");

TRITONSERVER_ResponseAllocatorNewFn TRITONSERVER_ResponseAllocatorNew = (TRITONSERVER_ResponseAllocatorNewFn)GetProcAddress(triton_handle, "TRITONSERVER_ResponseAllocatorNew");

TRITONSERVER_InferenceRequestSetResponseCallbackFn TRITONSERVER_InferenceRequestSetResponseCallback = (TRITONSERVER_InferenceRequestSetResponseCallbackFn)GetProcAddress(triton_handle, "TRITONSERVER_InferenceRequestSetResponseCallback");

//TRITONSERVER_InferenceResponseOutputCount = reinterpret_cast<TRITONSERVER_InferenceResponseOutputCountFn>(LoadFunction(handle, "TRITONSERVER_InferenceResponseOutputCount"));
//TRITONSERVER_InferenceResponseOutput = reinterpret_cast<TRITONSERVER_InferenceResponseOutputFn>(LoadFunction(handle, "TRITONSERVER_InferenceResponseOutput"));

TRITONSERVER_InferenceResponseOutputFn_t TRITONSERVER_InferenceResponseOutput = (TRITONSERVER_InferenceResponseOutputFn_t) GetProcAddress(triton_handle, "TRITONSERVER_InferenceResponseOutput");
TRITONSERVER_InferenceResponseOutputCountFn_t TRITONSERVER_InferenceResponseOutpuCount = (TRITONSERVER_InferenceResponseOutputCountFn_t) GetProcAddress(triton_handle, "TRITONSERVER_InferenceResponseOutpuCountt");

    // Check if any of the functions are not loaded
    if (!TRITONSERVER_ServerOptionsNew || !TRITONSERVER_ServerOptionsSetModelRepositoryPath || !TRITONSERVER_ServerOptionsSetBackendDirectory || !TRITONSERVER_ServerOptionsSetLogVerbose || !TRITONSERVER_ServerOptionsSetModelControlMode
        || !TRITONSERVER_ServerOptionsSetStrictModelConfig || !TRITONSERVER_ServerNew || !TRITONSERVER_ServerIsReady || !TRITONSERVER_ServerDelete 
        || !TRITONSERVER_ServerOptionsDelete || !TRITONSERVER_ServerModelIndex || !TRITONSERVER_MessageSerializeToJson 
        || !TRITONSERVER_InferenceRequestNew  || !TRITONSERVER_InferenceRequestAddInput || !TRITONSERVER_InferenceRequestAppendInputData || !TRITONSERVER_InferenceRequestSetReleaseCallback
        || !TRITONSERVER_InferenceRequestSetResponseCallback || !TRITONSERVER_InferenceRequestAddRequestedOutput 
        || !TRITONSERVER_InferenceRequestDelete || !TRITONSERVER_ResponseAllocatorNew || !TRITONSERVER_InferenceResponseOutput ) {
        std::cerr << "Failed to load one or more Triton functions." << std::endl;
        exit(1);
    }

    // Create server options
    TRITONSERVER_ServerOptions* server_options = nullptr;
    CheckTritonError(TRITONSERVER_ServerOptionsNew(&server_options));

    std::cout << "Setting model repository path to: " << model_repository_path << std::endl;
    CheckTritonError(TRITONSERVER_ServerOptionsSetModelRepositoryPath(server_options, model_repository_path));

    std::cout << "Setting backend directory for 'onnxruntime' to: " << onnx_runtime_backend_so << std::endl;
    // Setting backend directory for 'onnxruntime'
    //CheckTritonError(TRITONSERVER_ServerOptionsSetBackendDirectory(server_options,onnx_runtime_backend_so));
//CheckTritonError(TRITONSERVER_ServerOptionsSetBackendDirectory(server_options,"/home/Z004HSRM/Downloads/craneAI/library/onnx_old_version/onnxruntime_backend/build/install/backends/"));

//std::cout << "Disabling GPU metrics" << std::endl;
//    CheckTritonError(TRITONSERVER_ServerOptionsSetGpuMetrics(server_options, false));

    // Setting log verbosity
    std::cout << "Setting log verbosity to 1" << std::endl;
    CheckTritonError(TRITONSERVER_ServerOptionsSetLogVerbose(server_options, 1));

    // Setting ModelControlMode to explicit
    std::cout << "Setting ModelControlMode to explicit" << std::endl;
 //  CheckTritonError(TRITONSERVER_ServerOptionsSetModelControlMode(server_options, TRITONSERVER_MODEL_CONTROL_EXPLICIT));

    // Disabling strict model configuration
    std::cout << "Disabling strict model configuration" << std::endl;
    CheckTritonError(TRITONSERVER_ServerOptionsSetStrictModelConfig(server_options, false));

    // Create the server
   // TRITONSERVER_Server* server_ptr = nullptr;
    CheckTritonError(TRITONSERVER_ServerNew(&server_ptr, server_options));

    // Wait for the server to be ready
    bool is_ready = false;
    CheckTritonError(TRITONSERVER_ServerIsReady(server_ptr, &is_ready));
    std::cout << "Server is ready: " << (is_ready ? "Yes" : "No") << std::endl;

    std::cout << "Loading model " << std::endl;
    // TRITONSERVER_Error* load_model_error = TRITONSERVER_ServerLoadModel(server_ptr, model_name);
    // if (load_model_error != nullptr) {
    //     std::cerr << "Failed to load  model: " << TRITONSERVER_ErrorMessage(load_model_error) << std::endl;
    //     TRITONSERVER_ErrorDelete(load_model_error);
    // } else {
    //     std::cout << "model loaded successfully" << std::endl;
    // }

    // Get the model repository index and print it
    TRITONSERVER_Message* repository_index_message = nullptr;
    CheckTritonError(TRITONSERVER_ServerModelIndex(server_ptr, &repository_index_message));

    const char* repository_index_json = nullptr;
    size_t repository_index_json_size = 0;
   CheckTritonError(TRITONSERVER_MessageSerializeToJson(repository_index_message, &repository_index_json, &repository_index_json_size));

    std::cout << "Model Repository Index: " << std::string(repository_index_json, repository_index_json_size) << std::endl;

    TRITONSERVER_InferenceRequest* inference_request = nullptr;
    CheckTritonError(TRITONSERVER_InferenceRequestNew(&inference_request, server_ptr, "resnet50", 1));

// bool is_ready1 = false;
// CheckTritonError(TRITONSERVER_ServerModelIsReady(server_ptr, "resnet50", 1, &is_ready1));
// if (!is_ready1) {
//     std::cerr << "Model is not ready" << std::endl;
//     //return;
// }

}

// extern "C" void InitializeServer_yolo(const char* model_name_arg) {

//   //  setenv("CUDA_VISIBLE_DEVICES", "", 1);
//     const char* model_repository_path = "/home/Z004HSRM/Downloads/craneAI/codebase/triton_custom_cpp/yolofd/";
//     const char* onnx_runtime_backend_so = "libtriton_onnxruntime.so";
//     const char* triton_server_so = "libtritonserver.so";

//     const char* model_name = "yolov5";
//     const char* input_name = "images";
//     const char* output_name = "output0";

//     // Load Triton Inference Server shared object
//     triton_handle = dlopen(triton_server_so, RTLD_LAZY);
//     if (!triton_handle) {
//         std::cerr << "Cannot load Triton shared object: " << dlerror() << std::endl;
//         exit(1);
//     }

//     // Load ONNX Runtime backend shared object
//     onnx_handle = dlopen(onnx_runtime_backend_so, RTLD_LAZY);
//     if (!onnx_handle) {
//         std::cerr << "Cannot load ONNX Runtime backend shared object: " << dlerror() << std::endl;
//         exit(1);
//     }

//     // Load the necessary functions
//     TRITONSERVER_ServerOptionsNewFn TRITONSERVER_ServerOptionsNew = (TRITONSERVER_ServerOptionsNewFn)GetProcAddress(triton_handle, "TRITONSERVER_ServerOptionsNew");
//     TRITONSERVER_ServerOptionsSetModelRepositoryPathFn TRITONSERVER_ServerOptionsSetModelRepositoryPath = (TRITONSERVER_ServerOptionsSetModelRepositoryPathFn)GetProcAddress(triton_handle, "TRITONSERVER_ServerOptionsSetModelRepositoryPath");
//     TRITONSERVER_ServerOptionsSetBackendDirectoryFn TRITONSERVER_ServerOptionsSetBackendDirectory = (TRITONSERVER_ServerOptionsSetBackendDirectoryFn)GetProcAddress(triton_handle, "TRITONSERVER_ServerOptionsSetBackendDirectory");
//     //TRITONSERVER_ServerOptionsGetBackendDirectoryFn TRITONSERVER_ServerOptionsGetBackendDirectory = (TRITONSERVER_ServerOptionsGetBackendDirectoryFn)GetProcAddress(triton_handle, "TRITONSERVER_ServerOptionsGetBackendDirectory");
//     TRITONSERVER_ServerOptionsSetLogVerboseFn TRITONSERVER_ServerOptionsSetLogVerbose = (TRITONSERVER_ServerOptionsSetLogVerboseFn)GetProcAddress(triton_handle, "TRITONSERVER_ServerOptionsSetLogVerbose");
//     TRITONSERVER_ServerOptionsSetModelControlModeFn TRITONSERVER_ServerOptionsSetModelControlMode = (TRITONSERVER_ServerOptionsSetModelControlModeFn)GetProcAddress(triton_handle, "TRITONSERVER_ServerOptionsSetModelControlMode");
//     TRITONSERVER_ServerOptionsSetStrictModelConfigFn TRITONSERVER_ServerOptionsSetStrictModelConfig = (TRITONSERVER_ServerOptionsSetStrictModelConfigFn)GetProcAddress(triton_handle, "TRITONSERVER_ServerOptionsSetStrictModelConfig");
//     TRITONSERVER_ServerNewFn TRITONSERVER_ServerNew = (TRITONSERVER_ServerNewFn)GetProcAddress(triton_handle, "TRITONSERVER_ServerNew");
//     TRITONSERVER_ServerIsReadyFn TRITONSERVER_ServerIsReady = (TRITONSERVER_ServerIsReadyFn)GetProcAddress(triton_handle, "TRITONSERVER_ServerIsReady");
//     TRITONSERVER_ServerDeleteFn TRITONSERVER_ServerDelete = (TRITONSERVER_ServerDeleteFn)GetProcAddress(triton_handle, "TRITONSERVER_ServerDelete");
//     TRITONSERVER_ServerOptionsDeleteFn TRITONSERVER_ServerOptionsDelete = (TRITONSERVER_ServerOptionsDeleteFn)GetProcAddress(triton_handle, "TRITONSERVER_ServerOptionsDelete");
//     TRITONSERVER_MessageSerializeToJsonFn TRITONSERVER_MessageSerializeToJson = (TRITONSERVER_MessageSerializeToJsonFn)GetProcAddress(triton_handle, "TRITONSERVER_MessageSerializeToJson");
//     TRITONSERVER_ServerModelIndexFn TRITONSERVER_ServerModelIndex = (TRITONSERVER_ServerModelIndexFn)GetProcAddress(triton_handle, "TRITONSERVER_ServerModelIndex");

// //inf req

//     TRITONSERVER_InferenceRequestNewFn TRITONSERVER_InferenceRequestNew = (TRITONSERVER_InferenceRequestNewFn)GetProcAddress(triton_handle, "TRITONSERVER_InferenceRequestNew");

//     TRITONSERVER_InferenceRequestAddInputFn TRITONSERVER_InferenceRequestAddInput = (TRITONSERVER_InferenceRequestAddInputFn)GetProcAddress(triton_handle, "TRITONSERVER_InferenceRequestAddInput");
//     TRITONSERVER_InferenceRequestAppendInputDataFn TRITONSERVER_InferenceRequestAppendInputData = (TRITONSERVER_InferenceRequestAppendInputDataFn)GetProcAddress(triton_handle, "TRITONSERVER_InferenceRequestAppendInputData");

//     TRITONSERVER_InferenceRequestAddRequestedOutputFn TRITONSERVER_InferenceRequestAddRequestedOutput = (TRITONSERVER_InferenceRequestAddRequestedOutputFn)GetProcAddress(triton_handle, "TRITONSERVER_InferenceRequestAddRequestedOutput");

//   //  TRITONSERVER_ServerInferAsyncFn TRITONSERVER_ServerInferAsync = (TRITONSERVER_ServerInferAsyncFn)GetProcAddress(triton_handle, "TRITONSERVER_ServerInferAsync");

// TRITONSERVER_ServerInferAsyncFn TRITONSERVER_ServerInferAsync = (TRITONSERVER_ServerInferAsyncFn)GetProcAddress(triton_handle, "TRITONSERVER_ServerInferAsync");

//  TRITONSERVER_InferenceRequestSetReleaseCallbackFn TRITONSERVER_InferenceRequestSetReleaseCallback = (TRITONSERVER_InferenceRequestSetReleaseCallbackFn)GetProcAddress(triton_handle, "TRITONSERVER_InferenceRequestSetReleaseCallback");
//  //   TRITONSERVER_InferenceRequestSetResponseCallbackFn TRITONSERVER_InferenceRequestSetResponseCallback = (TRITONSERVER_InferenceRequestSetResponseCallbackFn)GetProcAddress(triton_handle, "TRITONSERVER_InferenceRequestSetResponseCallback");

// TRITONSERVER_InferenceRequestDeleteFn TRITONSERVER_InferenceRequestDelete = (TRITONSERVER_InferenceRequestDeleteFn)GetProcAddress(triton_handle, "TRITONSERVER_InferenceRequestDelete");

// TRITONSERVER_ResponseAllocatorNewFn TRITONSERVER_ResponseAllocatorNew = (TRITONSERVER_ResponseAllocatorNewFn)GetProcAddress(triton_handle, "TRITONSERVER_ResponseAllocatorNew");

// TRITONSERVER_InferenceRequestSetResponseCallbackFn TRITONSERVER_InferenceRequestSetResponseCallback = (TRITONSERVER_InferenceRequestSetResponseCallbackFn)GetProcAddress(triton_handle, "TRITONSERVER_InferenceRequestSetResponseCallback");

// //TRITONSERVER_InferenceResponseOutputCount = reinterpret_cast<TRITONSERVER_InferenceResponseOutputCountFn>(LoadFunction(handle, "TRITONSERVER_InferenceResponseOutputCount"));
// //TRITONSERVER_InferenceResponseOutput = reinterpret_cast<TRITONSERVER_InferenceResponseOutputFn>(LoadFunction(handle, "TRITONSERVER_InferenceResponseOutput"));

// TRITONSERVER_InferenceResponseOutputFn_t TRITONSERVER_InferenceResponseOutput = (TRITONSERVER_InferenceResponseOutputFn_t) GetProcAddress(triton_handle, "TRITONSERVER_InferenceResponseOutput");
// TRITONSERVER_InferenceResponseOutputCountFn_t TRITONSERVER_InferenceResponseOutpuCount = (TRITONSERVER_InferenceResponseOutputCountFn_t) GetProcAddress(triton_handle, "TRITONSERVER_InferenceResponseOutpuCountt");

//     // Check if any of the functions are not loaded
//     if (!TRITONSERVER_ServerOptionsNew || !TRITONSERVER_ServerOptionsSetModelRepositoryPath || !TRITONSERVER_ServerOptionsSetBackendDirectory || !TRITONSERVER_ServerOptionsSetLogVerbose || !TRITONSERVER_ServerOptionsSetModelControlMode
//         || !TRITONSERVER_ServerOptionsSetStrictModelConfig || !TRITONSERVER_ServerNew || !TRITONSERVER_ServerIsReady || !TRITONSERVER_ServerDelete 
//         || !TRITONSERVER_ServerOptionsDelete || !TRITONSERVER_ServerModelIndex || !TRITONSERVER_MessageSerializeToJson 
//         || !TRITONSERVER_InferenceRequestNew  || !TRITONSERVER_InferenceRequestAddInput || !TRITONSERVER_InferenceRequestAppendInputData || !TRITONSERVER_InferenceRequestSetReleaseCallback
//         || !TRITONSERVER_InferenceRequestSetResponseCallback || !TRITONSERVER_InferenceRequestAddRequestedOutput 
//         || !TRITONSERVER_InferenceRequestDelete || !TRITONSERVER_ResponseAllocatorNew || !TRITONSERVER_InferenceResponseOutput ) {
//         std::cerr << "Failed to load one or more Triton functions." << std::endl;
//         exit(1);
//     }

//     // Create server options
//     TRITONSERVER_ServerOptions* server_options = nullptr;
//     CheckTritonError(TRITONSERVER_ServerOptionsNew(&server_options));

//     std::cout << "Setting model repository path to: " << model_repository_path << std::endl;
//     CheckTritonError(TRITONSERVER_ServerOptionsSetModelRepositoryPath(server_options, model_repository_path));

//     std::cout << "Setting backend directory for 'onnxruntime' to: " << onnx_runtime_backend_so << std::endl;
//     // Setting backend directory for 'onnxruntime'
//     //CheckTritonError(TRITONSERVER_ServerOptionsSetBackendDirectory(server_options,onnx_runtime_backend_so));
// CheckTritonError(TRITONSERVER_ServerOptionsSetBackendDirectory(server_options,"/home/Z004HSRM/Downloads/craneAI/library/onnx_old_version/onnxruntime_backend/build/install/backends/"));

// //std::cout << "Disabling GPU metrics" << std::endl;
// //    CheckTritonError(TRITONSERVER_ServerOptionsSetGpuMetrics(server_options, false));

//     // Setting log verbosity
//     std::cout << "Setting log verbosity to 1" << std::endl;
//     CheckTritonError(TRITONSERVER_ServerOptionsSetLogVerbose(server_options, 1));

//     // Setting ModelControlMode to explicit
//     std::cout << "Setting ModelControlMode to explicit" << std::endl;
//  //  CheckTritonError(TRITONSERVER_ServerOptionsSetModelControlMode(server_options, TRITONSERVER_MODEL_CONTROL_EXPLICIT));

//     // Disabling strict model configuration
//     std::cout << "Disabling strict model configuration" << std::endl;
//     CheckTritonError(TRITONSERVER_ServerOptionsSetStrictModelConfig(server_options, false));

//     // Create the server
//    // TRITONSERVER_Server* server_ptr = nullptr;
//     CheckTritonError(TRITONSERVER_ServerNew(&server_ptr, server_options));

//     // Wait for the server to be ready
//     bool is_ready = false;
//     CheckTritonError(TRITONSERVER_ServerIsReady(server_ptr, &is_ready));
//     std::cout << "Server is ready: " << (is_ready ? "Yes" : "No") << std::endl;

//     std::cout << "Loading model " << std::endl;
//     // TRITONSERVER_Error* load_model_error = TRITONSERVER_ServerLoadModel(server_ptr, model_name);
//     // if (load_model_error != nullptr) {
//     //     std::cerr << "Failed to load  model: " << TRITONSERVER_ErrorMessage(load_model_error) << std::endl;
//     //     TRITONSERVER_ErrorDelete(load_model_error);
//     // } else {
//     //     std::cout << "model loaded successfully" << std::endl;
//     // }

//     // Get the model repository index and print it
//     TRITONSERVER_Message* repository_index_message = nullptr;
//     CheckTritonError(TRITONSERVER_ServerModelIndex(server_ptr, &repository_index_message));

//     const char* repository_index_json = nullptr;
//     size_t repository_index_json_size = 0;
//    CheckTritonError(TRITONSERVER_MessageSerializeToJson(repository_index_message, &repository_index_json, &repository_index_json_size));

//     std::cout << "Model Repository Index: " << std::string(repository_index_json, repository_index_json_size) << std::endl;

// bool is_ready1 = false;
// CheckTritonError(TRITONSERVER_ServerModelIsReady(server_ptr, model_name, 1, &is_ready1));
// if (!is_ready1) {
//     std::cerr << "Model is not ready" << std::endl;
//     return;
// }
// std::cout << "Serve initialized .... " << std::endl;
// }

// extern "C" void performInference(const float* input_data, size_t input_size,size_t batch_size)
// {

//     TRITONSERVER_InferenceRequest* inference_request = nullptr;
//     CheckTritonError(TRITONSERVER_InferenceRequestNew(&inference_request, server_ptr, "resnet50", 1));

//     // Set input
//     //int64_t input_dims[] = {1, 3, 224, 224};
//     int64_t input_dims[] = { (int64_t)batch_size, 3, 224, 224 }; //64bit signed bit . can we reduce in optimization 
//     CheckTritonError(TRITONSERVER_InferenceRequestAddInput(inference_request, "input", TRITONSERVER_TYPE_FP32, input_dims, 4));

//     // Set input data (dummy data)
//     //std::vector<float> input_data(1 * 3 * 224 * 224, 0.5);
//    // CheckTritonError(TRITONSERVER_InferenceRequestAppendInputData(inference_request, "input", input_data.data(), input_data.size() * sizeof(float), TRITONSERVER_MEMORY_CPU, 0));

//     CheckTritonError(TRITONSERVER_InferenceRequestAppendInputData(inference_request, "input", input_data, input_size * sizeof(float) * sizeof(float), TRITONSERVER_MEMORY_CPU, 0));

//     std::cout<<"inference request updated with data ............"<<std::endl;

//     CheckTritonError(TRITONSERVER_InferenceRequestAddRequestedOutput(inference_request, "output")); 

//     std::cout<<"request for output............."<<std::endl;

//     CheckTritonError(TRITONSERVER_InferenceRequestSetReleaseCallback(inference_request, RequestRelease, nullptr));

//  // Create a response allocator
//       bool inference_completed = false;
//     CheckTritonError(TRITONSERVER_ResponseAllocatorNew(&response_allocator, ResponseAlloc, ResponseRelease,nullptr));
//    // CheckTritonError(TRITONSERVER_InferenceRequestSetResponseCallback(inference_request, response_allocator, nullptr, InferResponseComplete, nullptr));

// CheckTritonError(TRITONSERVER_InferenceRequestSetResponseCallback(
//     inference_request,
//     response_allocator,
//     nullptr,
// InferResponseComplete,
//     &inference_completed));

//     auto start_time = std::chrono::high_resolution_clock::now();
//     // Perform inference
//     CheckTritonError(TRITONSERVER_ServerInferAsync(server_ptr, inference_request, nullptr));

//     auto timeout = std::chrono::system_clock::now() + std::chrono::seconds(20);
//     while (!inference_completed && std::chrono::system_clock::now() < timeout) {
//         std::this_thread::sleep_for(std::chrono::milliseconds(10));
//     }

//     if (!inference_completed) {
//         std::cerr << "Timeout occurred. Inference request did not complete within 60 seconds." << std::endl;
//     }
//     else
//     {
//        auto end_time = std::chrono::high_resolution_clock::now();
//         std::cout << "Total time for inference request: "
//                   << std::chrono::duration<double, std::milli>(end_time - start_time).count()
//                   << " milliseconds." << std::endl;

//     }

//     return;
// }

// extern "C" void performInference_yolo(const float* input_data, size_t input_size,size_t batch_size)
// {
//     std::cout<<"PERFORMING INFERENCE YOLO........"<<std::endl;
//     TRITONSERVER_InferenceRequest* inference_request = nullptr;
//     CheckTritonError(TRITONSERVER_InferenceRequestNew(&inference_request, server_ptr, "yolov5", 1));

//     // Set input
//     //int64_t input_dims[] = {1, 3, 224, 224};
//     int64_t input_dims[] = { (int64_t)batch_size, 3, 640, 640 }; //64bit signed bit . can we reduce in optimization 
//     CheckTritonError(TRITONSERVER_InferenceRequestAddInput(inference_request, "images", TRITONSERVER_TYPE_FP32, input_dims, 4));

//     // Set input data (dummy data)
//     //std::vector<float> input_data(1 * 3 * 224 * 224, 0.5);
//    // CheckTritonError(TRITONSERVER_InferenceRequestAppendInputData(inference_request, "input", input_data.data(), input_data.size() * sizeof(float), TRITONSERVER_MEMORY_CPU, 0));

//     CheckTritonError(TRITONSERVER_InferenceRequestAppendInputData(inference_request, "images", input_data, input_size * sizeof(float) * sizeof(float), TRITONSERVER_MEMORY_CPU, 0));

//     std::cout<<"inference request updated with data ............"<<std::endl;

//     CheckTritonError(TRITONSERVER_InferenceRequestAddRequestedOutput(inference_request, "output0")); 

//     std::cout<<"request for output............."<<std::endl;

//     CheckTritonError(TRITONSERVER_InferenceRequestSetReleaseCallback(inference_request, RequestRelease, nullptr));

//  // Create a response allocator
//       bool inference_completed = false;
//     CheckTritonError(TRITONSERVER_ResponseAllocatorNew(&response_allocator, ResponseAlloc, ResponseRelease,nullptr));
//    // CheckTritonError(TRITONSERVER_InferenceRequestSetResponseCallback(inference_request, response_allocator, nullptr, InferResponseComplete, nullptr));

// CheckTritonError(TRITONSERVER_InferenceRequestSetResponseCallback(
//     inference_request,
//     response_allocator,
//     nullptr,
// InferResponseComplete,
//     &inference_completed));

//     auto start_time = std::chrono::high_resolution_clock::now();
//     // Perform inference
//     CheckTritonError(TRITONSERVER_ServerInferAsync(server_ptr, inference_request, nullptr));

//     auto timeout = std::chrono::system_clock::now() + std::chrono::seconds(20);
//     while (!inference_completed && std::chrono::system_clock::now() < timeout) {
//         std::this_thread::sleep_for(std::chrono::milliseconds(10));
//     }

//     if (!inference_completed) {
//         std::cerr << "Timeout occurred. Inference request did not complete within 60 seconds." << std::endl;
//     }
//     else
//     {
//        auto end_time = std::chrono::high_resolution_clock::now();
//         std::cout << "Total time for inference request: "
//                   << std::chrono::duration<double, std::milli>(end_time - start_time).count()
//                   << " milliseconds." << std::endl;

//     }

//     return;
// }

extern "C" void cleanupServer()
{

    //TRITONSERVER_MessageDelete(repository_index_message);

    // Delete server options
   // TRITONSERVER_ServerOptionsDelete(server_options);

    // Delete the server
    // TRITONSERVER_ServerDelete(server_ptr);

    // Close the loaded libraries
    // dlclose(triton_handle);
    // dlclose(onnx_handle);
}

}

int main() {
   // TRITONSERVER_Server* server = nullptr;
   // TRITONSERVER_ResponseAllocator* response_allocator = nullptr;

    //InitializeServer_yolo("temp_string");
    InitializeServer("temp_string");

    // Prepare dummy data for inference
    std::vector<float> batch_data(4 *3* 224 * 224, 1.0f); // Example data
    //performInference(batch_data.data(),batch_data.size(),4);
    // std::vector<int64_t> input_shape = {4, 3, 224, 224};
    //std::vector<float> batch_data(8 *3* 640 * 640, 1.0f); // Example data

//     performInference_yolo(batch_data.data(),batch_data.size(),8);
//   std::chrono::seconds durationToWait(10); 
//   std::this_thread::sleep_for(durationToWait);
//   std::cout << "30 seconds have passed. Proceeding with next line of code." << std::endl;
// cleanupServer();
    return 0;
}

Expected behavior What changes do I need to make in either the settings, or the prod.cpp code, so as to make sure that I can make use of the ONNX model at runtime natively in Windows using C++ call, without GRPC or HTTP calls.

rmccorm4 commented 1 month ago

CC @fpetrini15 @krishung5 if you're familar with support for in-process API on Windows

nv-kmcgill53 commented 1 month ago

Hi @saugatapaul1010, we are planning to add the tritonserver.lib file into the windows assets for 24.08. This will allow you to link against the Triton C API in your build.