Is it necessary to include openvino_tokenizers in the production environment?

LNTH commented 1 month ago

Context

I successfully converted the tokenizer into an OpenVINO (OV) model and connected it with my main model. However, when I use the combined model, I still need to import openvino_tokenizers.

Code to reproduce

Environment: Google Colab

python = 3.10.12
transformers = 4.42.4
# install openvino and openvino-tokenizers with !pip install openvino openvino-tokenizers
openvino = '2024.3.0-16041-1e3b88e4e3f-releases/2024/3'
openvino-tokenizers = '2024.3.0.0'

Code to convert model

from transformers import AutoTokenizer, AutoModelForTokenClassification
from openvino_tokenizers import convert_tokenizer, connect_models
import openvino as ov
import os

# load hf model
model_name = "NlpHUST/ner-vietnamese-electra-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

# example input for converting
example = "Đại tá Nguyễn Văn Dương vừa có cuộc họp cùng một số đơn vị."
inputs = tokenizer(example, return_tensors="pt")

# convert to ov
os.makedirs("./ov_models", exist_ok=True)

ov_tokenizer = convert_tokenizer(tokenizer, tokenizer_output_type=ov.Type.i32)
ov_model = ov.convert_model(
    model,
    input = {
        "input_ids": ([-1, -1], ov.Type.i32),
        "token_type_ids": ([-1, -1], ov.Type.i32),
        "attention_mask": ([-1, -1], ov.Type.i32)
    },
    example_input = dict(inputs)
)
combined_model = connect_models(ov_tokenizer, ov_model)
ov.save_model(combined_model, "ov_models/combined_model.xml")

Use the combine_model (after restart session)

import openvino as ov

core = ov.Core()
model = core.compile_model("ov_models/combined_model.xml", "CPU")

Error

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-2-fb0bb569964a>](https://localhost:8080/#) in <cell line: 4>()
      2 
      3 core = ov.Core()
----> 4 model = core.compile_model("ov_models/combined_model.xml", "CPU")

[/usr/local/lib/python3.10/dist-packages/openvino/runtime/ie_api.py](https://localhost:8080/#) in compile_model(self, model, device_name, config, weights)
    541                 )
    542             return CompiledModel(
--> 543                 super().compile_model(model, device_name, {} if config is None else config),
    544             )
    545         else:

RuntimeError: Exception from src/inference/src/cpp/core.cpp:121:
Exception from src/inference/src/dev/plugin.cpp:58:
Exception from src/frontends/ir/src/ir_deserializer.cpp:904:
Cannot create StringTensorUnpack layer StringTensorUnpack_2 id:7 from unsupported opset: extension

Question:

Is this behavior expected? Or is there a way to avoid including the OpenVINO tokenizer in my production environment?

apaniukov commented 1 month ago

Hi @LNTH,

Yes, this is expected behaviour. OpenVINO Tokenizers contains two parts: 1) Binary OpenVINO extension that contains an implementation of tokenization operations 2) Python code that does the tokenizer conversion

To use the model with tokenizer operation, you must extend a Core object with the binary extension. There are several ways to do that: 1) The easiest (and recommended) way is to import openvino_tokenizer before you create a Core object. It will patch the Core object constructor to add the binary extension automatically. See the example from the readme. It is recommended to use a minimal installation option in the production to reduce the environment size. 2) Add extension manually: core.add_extension("path/to/libopenvino_tokenizers.so"). This path is primarily for C++ usage, but is also applicable if you use OpenVINO Python distribution from an archive.

To get a path to the binary extension in case of Python installation, you can use this command:

python -c "from openvino_tokenizers import _ext_path; print(_ext_path)"

I hope this helps. If you have any further questions, please let me know.

LNTH commented 1 month ago

Thanks @apaniukov, that’s all I need to know about ov-tokenizers.

I have one more question: what is the roadmap for implementing a detokenizer for WordPiece? For example, in an NER use case using small variant of BERT (tinybert, mobilebert, distilbert), which all utilizes WordPiece, without a detokenizer, I still have to rely on the transformers tokenizer for mapping token labels back to word labels.

apaniukov commented 1 month ago

Yes, we plan to cover all subword tokenizer/detokenizer types, including WordPiece detokenizer. For the NER use case, adding the output with token offsets in the original string is useful. It is on our roadmap, but other tasks have a higher priority right now.

openvinotoolkit / openvino_tokenizers