Open tsmith023 opened 1 year ago
Hi @tsmith023,
Apologies for the late reply, yes MarianMT
models are supported. Concerning the slow inference you're reporting, are you comparing the resulting OpenVINO model with the original PyTorch model and currently finding that the latency from the OpenVINO model is higher?
I'm not able to reproduce this, could you confirm that you're still observing it with :
import time
import torch
from optimum.intel import OVModelForSeq2SeqLM
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model_id = "Helsinki-NLP/opus-mt-es-en"
ov_model = OVModelForSeq2SeqLM.from_pretrained(model_id, export=True, use_cache=True)
torch_model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokens = tokenizer("This is a sample input", return_tensors="pt")
decoder_inputs = {"decoder_input_ids": torch.ones((1, 1), dtype=torch.long) * torch_model.config.decoder_start_token_id }
def elapsed_time(model, nb_pass=20):
start = time.time()
for _ in range(nb_pass):
model(**tokens, **decoder_inputs)
end = time.time()
return (end - start) / nb_pass
# warmup
elapsed_time(ov_model, nb_pass=5)
time_ov = elapsed_time(ov_model)
time_torch = elapsed_time(torch_model)
Hi @echarlaix, the problem didn't surface when executing within the Python runtime but when running the exported OVIR binaries within OpenVino itself, which is a C++ runtime. I was comparing the performance of the exported model within the Python runtime to its performance within the C++ runtime
Do you feel that this issue is better suited to the OpenVino repository? I raised it originally here since I judged it to be a problem with the model exporting logic. Let me know whether I should relocate it there or whether you feel there is an implementation issue here 😁
@pbebbo
I'm having trouble exporting the
Helsinki-NLP/opus-mt-es-en
model for language translation into the optimised OpenVino IR format. Reading through the other issues within this repository highlighted this issue https://github.com/huggingface/optimum-intel/issues/188, which seems to suffer from similar effects.In that case, it seemed to be an issue with the BigBird architecture and its lack of support by HuggingFace Optimum. However, the
Helsinki-NLP/opus-mt-es-en
model is of theMarianMT
class, which is documented as being supported.Am I missing something here fundamental? Is the conversion of the
MarianMT
model into OpenVino IR format currently unsupported by this library in a similar way to the BigBird models as in the above issue? Or are there aspects of the conversion that I am not specifying correctly such that the export is sub-optimal? It would seem that this should be possible given the documentation.I see the following during the build logs if it helps at all:
Asked a sequence length of 16, but a sequence length of 1 will be used with use_past == True for 'decoder_input_ids'.
An MRE looks like:
Operating
run("Hola, como estas?")
yields an inference time of0.6323761940002441s
while using the exported OVIR binaries in an OVMS model pipeline yields an inference time of45s
.Any help on this one would be greatly appreciated, cheers!
P.S. I can post the
config.json
file being passed to the OVMS instance, but it's very long so I'll leave it until it's required!