[WordPiece tokenizer] StringTensorUnpack not supported on gpu

openvinotoolkit / openvino_tokenizers

OpenVINO Tokenizers extension

Apache License 2.0

18 stars 16 forks source link

[WordPiece tokenizer] StringTensorUnpack not supported on gpu #131

Closed Amadeus-AI closed 3 months ago

Amadeus-AI commented 3 months ago

Context

When using WordPiece tokenizer, the IR model works on cpu, but failed on intel gpu @ win10. operation: [32332] Operation: StringTensorUnpack_2 of type StringTensorUnpack(extension) is not supported

What needs to be done?

Support StringTensorUnpack op on igpu

Resources

tokenizer.zip

slyalin commented 3 months ago

@Amadeus-AI, this is by the current design: operations from openvino_tokenizers library work on CPU only. This is true not only for StringTensorUnpack. Are you using tokenizer as a model which is separate from the main model? You can run the tokenizer on CPU and the main model on GPU. @vladimir-paramuzov, FYI.

Amadeus-AI commented 3 months ago

@Amadeus-AI, this is by the current design: operations from openvino_tokenizers library work on CPU only. This is true not only for StringTensorUnpack. Are you using tokenizer as a model which is separate from the main model? You can run the tokenizer on CPU and the main model on GPU. @vladimir-paramuzov, FYI.

thank for the answer. I encounter this because I am using combined model.

slyalin commented 3 months ago

@Amadeus-AI, if you are using a combined model please try to use HETERO:GPU,CPU as inference device. Please look at the example here: https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/hetero-execution.html#the-automatic-mode. It will try to run node on GPU by default, and if the op doesn't have an implmentation on GPU it will fallback to CPU. Not sure it works for all possible combinations -- we don't have exhaustive testing for that.

BTW, how are you building a combined model? Could you give a link to the conversion script?

Amadeus-AI commented 3 months ago

@Amadeus-AI, if you are using a combined model please try to use HETERO:GPU,CPU as inference device. Please look at the example here: https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/hetero-execution.html#the-automatic-mode. It will try to run node on GPU by default, and if the op doesn't have an implmentation on GPU it will fallback to CPU. Not sure it works for all possible combinations -- we don't have exhaustive testing for that.

BTW, how are you building a combined model? Could you give a link to the conversion script?

Thanks, I'll give it a try.

The conversion script:

from openvino import Core, save_model
from openvino_tokenizers import connect_models

core = Core()
core.add_extension("openvino_tokenizers.dll")

ov_tokenizer = core.read_model("tokenizer.xml")
ov_model = core.read_model("encoder.xml")
combined_model = connect_models(ov_tokenizer, ov_model)
save_model(combined_model, "encoder_combined.xml")

slyalin commented 3 months ago

Is a combined model more convenient for your case? I mean, is it a big deal for you to keep two models separate and avoid HETERO mode?

Amadeus-AI commented 3 months ago

Is a combined model more convenient for your case? I mean, is it a big deal for you to keep two models separate and avoid HETERO mode?

Well I'm implementing it in c++, the fewer code / intermediate variables, the better haha btw I have done implementing it in separate way, so that's fine.