openvinotoolkit / open_model_zoo

Pre-trained Deep Learning models and demos (high quality and extremely fast)
https://docs.openvino.ai/latest/model_zoo.html
Apache License 2.0
4.1k stars 1.37k forks source link

text_to_speech demo not working with -s_id -1(for tts multi model) #3429

Closed roshan-ku closed 2 years ago

roshan-ku commented 2 years ago

Hi I was trying to run the text to speech demo multi-model (OV 2022.1) with speaker id (-s_id -1), I am facing the below error.

Traceback (most recent call last): File "C:\workspace\tts\applications.ai.conversational-ai.tts\services\text_to_speech_demo.py", line 388, in sys.exit(main() or 0) File "C:\workspace\tts\applications.ai.conversational-ai.tts\services\text_to_speech_demo.py", line 265, in main mel = forward_tacotron.forward( File "C:\workspace\tts\applications.ai.conversational-ai.tts\services\models\forward_tacotron_ie.py", line 283, in forward aligned_emb = self.forward_duration_prediction_by_delimiters( File "C:\workspace\tts\applications.ai.conversational-ai.tts\services\models\forward_tacotron_ie.py", line 219, in forward_duration_prediction_by_delimiters self.infer_duration( File "C:\workspace\tts\applications.ai.conversational-ai.tts\services\models\forward_tacotron_ie.py", line 156, in infer_duration self.duration_predictor_request.infer(inputs) File "C:\workspace\tts\applications.ai.conversational-ai.tts\openvino_env\lib\site-packages\openvino\runtime\ie_api.py", line 109, in infer return super().infer( RuntimeError: Can't SetBlob with name: speaker_embedding, because model input (shape={1,2}) and blob (shape=(1.1.2)) are incompatible

Command used to run the demo

python text_to_speech_demo.py -i "How are you doing, what is up with you" -o ./audio.wav -m_duration intel/text-to-speech-en-multi-0001/text-to-speech-en-multi-0001-duration-prediction/FP32/text-to-speech-en-multi-0001-duration-prediction.xml -m_forward intel/text-to-speech-en-multi-0001/text-to-speech-en-multi-0001-regression/FP32/text-to-speech-en-multi-0001-regression.xml -m_melgan intel/text-to-speech-en-multi-0001/text-to-speech-en-multi-0001-generation/FP32/text-to-speech-en-multi-0001-generation.xml -s_id -1 
Wovchena commented 2 years ago

@akorobeinikov, please, take a look