triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.08k stars 1.46k forks source link

Encountering a segmentation fault issue when attempting to send multiple images via gRPC #6891

Open lawliet0823 opened 7 months ago

lawliet0823 commented 7 months ago

I referenced the ensemble_image_client.cc code and attempted to use a single request for multiple image inferences. However, I encountered a segmentation fault.

Here is my code:

void clientSendImages(const std::vector<cv::Mat> images, std::string model_name_, std::string model_version_) {

const std::string url("0.0.0.0:8001");
const std::string model_name(model_name_);

std::unique_ptr<tc::InferenceServerGrpcClient> grpcClient;

FAIL_IF_ERR(tc::InferenceServerGrpcClient::Create(&grpcClient, url, false), "Error creating grpc client");

tc::InferInput *input;
const std::string input_name = "images";
const std::vector<int64_t> input_shape = {10, 3, 640, 640};

FAIL_IF_ERR(tc::InferInput::Create(&input, input_name, input_shape, "FP32"), "Error: Creating input failed!");
std::shared_ptr<tc::InferInput> input_ptr(input);

tc::InferRequestedOutput* output;
FAIL_IF_ERR(tc::InferRequestedOutput::Create(&output, "output0", 0), "Error: Creating output failed!");
std::shared_ptr<tc::InferRequestedOutput> output_ptr(output);

std::vector<tc::InferInput*> inputs = {input_ptr.get()};
std::vector<const tc::InferRequestedOutput*> outputs = {output_ptr.get()};

for (const auto &image : images)
{
    std::vector<float> image_data = preprocess(image);

    FAIL_IF_ERR(input_ptr->AppendRaw(reinterpret_cast<const uint8_t *>(&image_data[0]), image_data.size() * sizeof(float)), "Error: Setting input data failed!");
}

tc::InferOptions common_options(model_name_);
common_options.model_version_ = model_version_;
common_options.client_timeout_ = 0;

tc::InferResult *results;
FAIL_IF_ERR(grpcClient->Infer(&results, common_options, inputs, outputs),
            "unable to run model");

std::unique_ptr<tc::InferResult> results_ptr;
results_ptr.reset(results);

}

I intend to process 10 images using the preprocess function (which has been tested successfully for single image). I'm wondering where the issue might be stemming from.

kthui commented 7 months ago

Hi @lawliet0823, is the segmentation fault happening every time you run the code? Can you use some tools (i.e. gdb) to print the stack trace when the segmentation fault happens?

lawliet0823 commented 7 months ago

The problem occurs every time.

This is the stack trace printed out by GDB:

Thread 1 "grpc_client" received signal SIGSEGV, Segmentation fault. __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:317 317 ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: No such file or directory. (gdb) backtrace

0 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:317

1 0x00007ffff68b7b23 in std::__cxx11::basic_string<char, std::char_traits, std::allocator >::_M_append(char const*, unsigned long) ()

from /lib/x86_64-linux-gnu/libstdc++.so.6

2 0x00007ffff7846490 in triton::client::InferenceServerGrpcClient::PreRunProcessing(triton::client::InferOptions const&, std::vector<triton::client::InferInput, std::allocator<triton::client::InferInput> > const&, std::vector<triton::client::InferRequestedOutput const, std::allocator<triton::client::InferRequestedOutput const> > const&) ()

from /home/levi/Program/home-C/lib/libgrpcclient.so

3 0x00007ffff7847928 in triton::client::InferenceServerGrpcClient::Infer(triton::client::InferResult*, triton::client::InferOptions const&, std::vector<triton::client::InferInput, std::allocator<triton::client::InferInput> > const&, std::vector<triton::client::InferRequestedOutput const, std::allocator<triton::client::InferRequestedOutput const*> > const&, std::map<std::cxx11::basic_string<char, std::char_traits, std::allocator >, std::cxx11::basic_string<char, std::char_traits, std::allocator >, std::less<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > > const&, grpc_compression_algorithm) () from /home/levi/Program/home-C/lib/libgrpcclient.so

4 0x000055555555be02 in clientSendImages(std::vector<cv::Mat, std::allocator >, std::cxx11::basic_string<char, std::char_traits, std::allocator >, std::cxx11::basic_string<char, std::char_traits, std::allocator >) ()

5 0x0000555555557b91 in processVideo(std::__cxx11::basic_string<char, std::char_traits, std::allocator >) ()

6 0x000055555555c266 in main ()

kthui commented 7 months ago

I wonder if you still get the segmentation fault if you run the ensemble_image_client.cc directly via command line? If not, maybe there is a difference in the inputs passed into the triton::client::InferenceServerGrpcClient::Infer() function between the client and your code?

lawliet0823 commented 7 months ago

My updated code is as follows.

void clientSendImages(const std::vector<cv::Mat> images, std::string model_name_, std::string model_version_) 
{
    const std::string url("0.0.0.0:8001");
    std::unique_ptr<tc::InferenceServerGrpcClient> grpcClient;
    tc::Error err;

    FAIL_IF_ERR(tc::InferenceServerGrpcClient::Create(&grpcClient, url, false), "Error creating grpc client");

    tc::InferOptions options(model_name_);
    options.model_version_ = "1";

    tc::InferInput *input;
    const std::string input_name = "images";
    const std::vector<int64_t> input_shape = {10, 3, 640, 640};
    FAIL_IF_ERR(tc::InferInput::Create(&input, input_name, input_shape, "FP32"), "Error: Creating input failed");

    tc::InferRequestedOutput *output;
    // Set the number of classification expected
    err = tc::InferRequestedOutput::Create(&output, "output0");
    if (!err.IsOk())
    {
        std::cerr << "unable to get output: " << err << std::endl;
        exit(1);
    }
    std::shared_ptr<tc::InferRequestedOutput> output_ptr(output);

    std::shared_ptr<tc::InferInput> input_ptr(input);
    std::vector<tc::InferInput *> inputs = {input_ptr.get()};
    std::vector<const tc::InferRequestedOutput *> outputs = {output_ptr.get()};

    std::vector<std::vector<uint8_t>> image_data;
    // FAIL_IF_ERR(input_ptr->Reset(), "Reset Failed!!!");
    for (int index = 0; index < 10; index++)
    {
        image_data.emplace_back();
        Preprocess(images[index], cv::Size(640, 640), &(image_data.back()));
    }

    for (int index = 0; index < 10; index++)
    {
        FAIL_IF_ERR(input_ptr->AppendRaw(image_data[index]), "AppendRaw Failed!!!");
    }

    tc::InferResult *results;
    FAIL_IF_ERR(grpcClient->Infer(&results, options, inputs, outputs), "Fail to get results");
    std::unique_ptr<tc::InferResult> results_ptr;
    results_ptr.reset(results);

    Postprocess(std::move(results_ptr), 10, images);
}

No segmentation fault occurred. My post-processing code is as follows.

void Postprocess(
    const std::unique_ptr<tc::InferResult> result,
    const size_t batch_size, const std::vector<cv::Mat> images)
{
    std::string output_name("output0");
    const int rows = 8400;
    const int dimensions = 84;

    if (!result->RequestStatus().IsOk())
    {
        std::cerr << "inference failed with error: " << result->RequestStatus()
                  << std::endl;
        exit(1);
    }

    // Get and validate the shape and datatype
    std::vector<int64_t> shape;
    FAIL_IF_ERR(result->Shape(output_name, &shape), "unable to get shape ");
    printf("shape: %ld %ld %ld\n", shape[0], shape[1], shape[2]);

    std::string datatype;
    FAIL_IF_ERR(result->Datatype(output_name, &datatype), "unable to get datatype");

    const uint8_t *rawData = nullptr;
    size_t byteSize = 0;

    FAIL_IF_ERR(result->RawData(output_name, &rawData, &byteSize), "Error: Unable to get output data for tensor");

    if (byteSize != 10 * rows * dimensions * sizeof(float))
    {
        std::cerr << "Unexpected byteSize: " << byteSize << std::endl;
        exit(1);
    }

    int numGroups = 10;
    size_t groupSize = rows * dimensions * sizeof(float); 
    for (int i = 0; i < numGroups; ++i)
    {
        const uint8_t *groupData = rawData + i * groupSize;
        size_t groupByteSize = groupSize;

        std::vector<int> classIds;
        std::vector<float> confidences;
        std::vector<cv::Rect> boxesAfterNMS;

        processDetectionResults(groupData, groupByteSize, classIds, confidences, boxesAfterNMS);

        printf("Detect objects: %ld\n", boxesAfterNMS.size());
    }
}

I am not sure if my post-processing code is correct. The detection results extracted are all zero. Is there an error in the way data is extracted from rawData? The shape and datatype values are correct.