openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
https://docs.openvino.ai
Apache License 2.0
7.01k stars 2.21k forks source link

[Bug]: Set/Get String Tensor Data via C-API Does Not Work #26906

Open rahulchaphalkar opened 3 days ago

rahulchaphalkar commented 3 days ago

OpenVINO Version

2024.3.0 https://github.com/rahulchaphalkar/openvino/tree/add-extension

Operating System

Ubuntu 20.04 (LTS)

Device used for inference

CPU

Framework

None

Model used

Detokenizer.xml from TinyLlama-1.1B-Chat-v1.0

Issue description

The STRING element_type has been added to C-API, but in my testing with models that expect string tensors and output them, I see incorrect results. I have a test case below comparing a working C++ case, and a failing C case. I have done some processing on the received string data as you can see in the test case below, but I'm not able to get a valid string output. Reference - https://docs.openvino.ai/2024/openvino-workflow/running-inference/string-tensors.html

Step-by-step reproduction

Reproduction of getting string data from output of a model - I was working with TinyLlama-1.1B-Chat-v1.0 which I got from recommended steps in optimum-cli/gen.ai repos. I'm loading an extension for both cases, I have added support for loading extensions in C-API in my open PR, so you will need to use that for C case below. I am providing the detokenizer model with tokens extracted previously from Tinyllama model.

C++ case prints this correct output

./main /home/rahul/tools/TinyLlama-1.1B-Chat-v1.0
=2
- 2+2=4
- 3+3=6

C-Case prints some unvalid utf-8.

C++/Working Case

#include <openvino/openvino.hpp>

std::string detokenize(ov::InferRequest& detokenizer, std::vector<int64_t>& tokens) {
    constexpr size_t BATCH_SIZE = 1;
    detokenizer.set_input_tensor(ov::Tensor{ov::element::i64, {BATCH_SIZE, tokens.size()}, tokens.data()});
    detokenizer.infer();
    return detokenizer.get_output_tensor().data<std::string>()[0];
}

int main(int argc, char* argv[]) {

    std::vector<int64_t> accumulator = {29922, 29906, 13, 29899, 29871, 29906, 29974, 29906, 29922, 29946, 13, 29899, 29871, 29941, 29974, 29941, 29922, 29953, 13};
    ov::Core core;
    core.add_extension("/home/rahul/tools/tokenizers/libopenvino_tokenizers.so");

    ov::InferRequest detokenizer = core.compile_model(
        std::string{argv[1]} + "/openvino_detokenizer.xml", "CPU").create_infer_request();

    std::string text = detokenize(detokenizer, accumulator);
    std::cout << text << std::endl;
}

C/C-API/ Failing Case

#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include "openvino/c/openvino.h"

#define CHECK_STATUS(return_status)                                                      \
    if (return_status != OK) {                                                           \
        fprintf(stderr, "[ERROR] return status %d, line %d\n", return_status, __LINE__);                                                                       \
    }

char* detokenize(ov_infer_request_t* detokenizer, int64_t* tokens, int num_tokens) {
    const size_t BATCH_SIZE = 1;
    ov_status_e status;
    ov_tensor_t* input_tensor = NULL;
    ov_shape_t input_shape;
    int64_t input_shape_dims[2] = {BATCH_SIZE, num_tokens};

    status = ov_shape_create(2, input_shape_dims, &input_shape);
    if (status != OK) {
        fprintf(stderr, "Failed to create shape\n");
        return NULL;
    }

    status = ov_tensor_create_from_host_ptr(I64, input_shape, tokens, &input_tensor);
    if (status != OK) {
        fprintf(stderr, "Failed to create input tensor\n");
        return NULL;
    }

    status = ov_infer_request_set_input_tensor(detokenizer, input_tensor);
    if (status != OK) {
        fprintf(stderr, "Failed to set input tensor\n");
        return NULL;
    }

    status = ov_infer_request_infer(detokenizer);
    if (status != OK) {
        fprintf(stderr, "Failed to run inference\n");
        return NULL;
    }

    ov_tensor_t* output_tensor = NULL;
    status = ov_infer_request_get_output_tensor_by_index(detokenizer, 0, &output_tensor);
    if (status != OK) {
        fprintf(stderr, "Failed to get output tensor\n");
        return NULL;
    }

    void* output_data = NULL;
    status = ov_tensor_data(output_tensor, &output_data);
    if (status != OK) {
        fprintf(stderr, "Failed to get data from output tensor\n");
        return NULL;
    }

    size_t output_string_length = strlen((const char*)output_data);
    char* detokenized_string = (char*)malloc(output_string_length + 1);
    if (!detokenized_string) {
        fprintf(stderr, "Failed to allocate memory for detokenized string\n");
        return NULL;
    }
    strncpy(detokenized_string, (const char*)output_data, output_string_length);
    detokenized_string[output_string_length] = '\0';

    ov_tensor_free(input_tensor);
    ov_tensor_free(output_tensor);

    return detokenized_string;
}

int main(int argc, char** argv) {

    ov_core_t* core = NULL;
    ov_model_t* model = NULL;
    ov_compiled_model_t* compiled_model = NULL;
    ov_infer_request_t* detokenizer_request = NULL;
    char* text = NULL;
    int64_t accumulator[] = {29922, 29906, 13, 29899, 29871, 29906, 29974, 29906, 29922, 29946, 13, 29899, 29871, 29941, 29974, 29941, 29922, 29953, 13};

    const char* input_model = argv[1];
    const char* input_model_bin = argv[2];

    const char* tokenizers_path="/home/rahul/tools/tokenizers/libopenvino_tokenizers.so";

    CHECK_STATUS(ov_core_create(&core));
    CHECK_STATUS(ov_core_add_extension(core, tokenizers_path));
    CHECK_STATUS(ov_core_read_model(core, input_model, input_model_bin, &model));
    CHECK_STATUS(ov_core_compile_model(core, model, "CPU", 0, &compiled_model));
    CHECK_STATUS(ov_compiled_model_create_infer_request(compiled_model, &detokenizer_request));

    text = detokenize(detokenizer_request, accumulator, sizeof(accumulator) / sizeof(accumulator[0]));
    printf("text is %s", text);
}

Relevant log output

No response

Issue submission checklist

mlukasze commented 2 days ago

@peterchen-intel could you take a look, please?