running this command, gets me a consistent output in approx 7 seconds:
melo 我的名字叫小杨 dog.wav --language ZH
/Users/zihaolam/Projects/tts-editor/MeloTTS/melo/main.py:71: UserWarning: You specified a speaker but the language is English.
warnings.warn("You specified a speaker but the language is English.")
loading pickled model from cache
loaded pickled model from cache, took 8.529947996139526
> Text split to sentences.
我的名字叫小杨
> ===========================
0%| | 0/1 [00:00<?, ?it/s]Building prefix dict from the default dictionary ...
Loading model from cache /var/folders/j4/zkddp3ms6493qzbf3qf7rfwr0000gn/T/jieba.cache
Loading model cost 0.406 seconds.
Prefix dict has been built successfully.
Some weights of the model checkpoint at bert-base-multilingual-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
/Users/zihaolam/Projects/tts-editor/MeloTTS/.venv/lib/python3.9/site-packages/torch/nn/functional.py:4522: UserWarning: MPS: The constant padding of more than 3 dimensions is not currently supported natively. It uses View Ops default implementation to run. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/Pad.mm:472.)
return torch._C._nn.pad(input, pad, mode, value)
/Users/zihaolam/Projects/tts-editor/MeloTTS/melo/commons.py:123: UserWarning: MPS: no support for int64 for min_max, downcasting to a smaller data type (int32/float32). Native support for int64 has been added in macOS 13.3. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/ReduceOps.mm:612.)
max_length = length.max()
100%|██████████████████████████████████████████████████████████| 1/1 [00:07<00:00, 7.51s/it]
def get_model_pkl_path(language: str):
return os.path.join(os.path.dirname(__file__), f"model_{language}.pkl")
def get_model(language: str, device: str):
model_pkl_path = get_model_pkl_path(language)
if not os.path.exists(model_pkl_path):
from melo.api import TTS
model = TTS(language=language, device=device)
with open(model_pkl_path, "wb") as f:
pickle.dump(model, f)
else:
with open(model_pkl_path, "rb") as f:
start = time.time()
print("loading pickled model from cache")
model = pickle.load(f)
print("loaded pickled model from cache, took ", time.time()-start)
return model
Using pickle for TTS Model still does not help and takes approx 7 seconds for TTS for a short sentence.
Is there a way to improve the speed or further cache anything to reduce this cold start?
The gradio web UI takes approx 1 second to generate the same text. However, I would like to use the CLI instead of running a python server. Is there a way to optimise anything such that the CLI takes same time as the web UI/server?
running this command, gets me a consistent output in approx 7 seconds:
melo 我的名字叫小杨 dog.wav --language ZH
Using pickle for TTS Model still does not help and takes approx 7 seconds for TTS for a short sentence.
Is there a way to improve the speed or further cache anything to reduce this cold start?
The gradio web UI takes approx 1 second to generate the same text. However, I would like to use the CLI instead of running a python server. Is there a way to optimise anything such that the CLI takes same time as the web UI/server?