Closed vitaly-zdanevich closed 5 years ago
...as I understand for highest quality I need to use WaveRNN but I see that TTS model is coming soon
.
Ok I downloaded model locally, set path to it in server/conf.json
. Point number 4 from server/README.md:
Run the server
python server/server.py -c server/conf.json
. (Requires Flask)
Inside running Docker image I tried:
python server/server.py -c server/conf.json
Faced with:
python: command not found
Ok, I tried:
python3 server/server.py -c server/conf.json
and got:
Traceback (most recent call last):
File "server/server.py", line 3, in <module>
from synthesizer import Synthesizer
File "/srv/app/server/synthesizer.py", line 4, in <module>
import numpy as np
ImportError: No module named 'numpy'
Ok my next steps was:
$ pip3 install -r requirements.txt
Collecting git+git://github.com/bootphon/phonemizer@master (from -r requirements.txt (line 14))
Cloning git://github.com/bootphon/phonemizer (to revision master) to /tmp/pip-req-build-qw833s40
Requirement already satisfied (use --upgrade to upgrade): phonemizer==1.0.1 from git+git://github.com/bootphon/phonemizer@master in /usr/local/lib/python3.6/dist-packages/phonemizer-1.0.1-py3.6.egg (from -r requirements.txt (line 14))
Collecting numpy==1.14.3 (from -r requirements.txt (line 1))
Downloading https://files.pythonhosted.org/packages/71/90/ca61e203e0080a8cef7ac21eca199829fa8d997f7c4da3e985b49d0a107d/numpy-1.14.3-cp36-cp36m-manylinux1_x86_64.whl (12.2MB)
100% |████████████████████████████████| 12.2MB 1.3MB/s
Collecting lws (from -r requirements.txt (line 2))
Downloading https://files.pythonhosted.org/packages/3a/c7/856af2e1202e7a4c5102406196aa661edb402256e7ce2334be0c0d8afa2e/lws-1.2.tar.gz (133kB)
100% |████████████████████████████████| 143kB 4.9MB/s
Requirement already satisfied: torch>=0.4.1 in /usr/local/lib/python3.6/dist-packages/torch-1.0.1.post2-py3.6-linux-x86_64.egg (from -r requirements.txt (line 3)) (1.0.1.post2)
Collecting librosa==0.5.1 (from -r requirements.txt (line 4))
Downloading https://files.pythonhosted.org/packages/51/c2/d8d8498252a2430ec9b90481754aca287c0ecc237a8feb331fa3b8933575/librosa-0.5.1.tar.gz (1.5MB)
100% |████████████████████████████████| 1.5MB 4.0MB/s
Requirement already satisfied: Unidecode==0.4.20 in /usr/local/lib/python3.6/dist-packages/Unidecode-0.4.20-py3.6.egg (from -r requirements.txt (line 5)) (0.4.20)
Collecting tensorboard (from -r requirements.txt (line 6))
Downloading https://files.pythonhosted.org/packages/0f/39/bdd75b08a6fba41f098b6cb091b9e8c7a80e1b4d679a581a0ccd17b10373/tensorboard-1.13.1-py3-none-any.whl (3.2MB)
100% |████████████████████████████████| 3.2MB 4.2MB/s
Requirement already satisfied: tensorboardX in /usr/local/lib/python3.6/dist-packages/tensorboardX-1.6-py3.6.egg (from -r requirements.txt (line 7)) (1.6)
Requirement already satisfied: matplotlib==2.0.2 in /usr/local/lib/python3.6/dist-packages/matplotlib-2.0.2-py3.6-linux-x86_64.egg (from -r requirements.txt (line 8)) (2.0.2)
Requirement already satisfied: Pillow in /usr/local/lib/python3.6/dist-packages/Pillow-6.0.0-py3.6-linux-x86_64.egg (from -r requirements.txt (line 9)) (6.0.0)
Requirement already satisfied: flask in /usr/local/lib/python3.6/dist-packages/Flask-1.0.2-py3.6.egg (from -r requirements.txt (line 10)) (1.0.2)
Collecting scipy==0.19.0 (from -r requirements.txt (line 11))
Downloading https://files.pythonhosted.org/packages/d0/7b/415fd5bb215f28b423d32dc98126f700ebe7f1efa53e65377ed6ed55df99/scipy-0.19.0-cp36-cp36m-manylinux1_x86_64.whl (48.2MB)
100% |████████████████████████████████| 48.2MB 308kB/s
Requirement already satisfied: tqdm in /usr/local/lib/python3.6/dist-packages/tqdm-4.31.1-py3.6.egg (from -r requirements.txt (line 13)) (4.31.1)
Requirement already satisfied: joblib in /usr/local/lib/python3.6/dist-packages/joblib-0.13.2-py3.6.egg (from phonemizer==1.0.1->-r requirements.txt (line 14)) (0.13.2)
Requirement already satisfied: segments in /usr/local/lib/python3.6/dist-packages/segments-2.0.1-py3.6.egg (from phonemizer==1.0.1->-r requirements.txt (line 14)) (2.0.1)
Requirement already satisfied: attrs>=18.1 in /usr/local/lib/python3.6/dist-packages/attrs-19.1.0-py3.6.egg (from phonemizer==1.0.1->-r requirements.txt (line 14)) (19.1.0)
Requirement already satisfied: audioread>=2.0.0 in /usr/local/lib/python3.6/dist-packages/audioread-2.1.6-py3.6.egg (from librosa==0.5.1->-r requirements.txt (line 4)) (2.1.6)
Requirement already satisfied: scikit-learn>=0.14.0 in /usr/local/lib/python3.6/dist-packages/scikit_learn-0.20.3-py3.6-linux-x86_64.egg (from librosa==0.5.1->-r requirements.txt (line 4)) (0.20.3)
Requirement already satisfied: decorator>=3.0.0 in /usr/local/lib/python3.6/dist-packages/decorator-4.4.0-py3.6.egg (from librosa==0.5.1->-r requirements.txt (line 4)) (4.4.0)
Requirement already satisfied: six>=1.3 in /usr/local/lib/python3.6/dist-packages/six-1.12.0-py3.6.egg (from librosa==0.5.1->-r requirements.txt (line 4)) (1.12.0)
Requirement already satisfied: resampy>=0.1.2 in /usr/local/lib/python3.6/dist-packages/resampy-0.2.1-py3.6.egg (from librosa==0.5.1->-r requirements.txt (line 4)) (0.2.1)
Collecting absl-py>=0.4 (from tensorboard->-r requirements.txt (line 6))
Downloading https://files.pythonhosted.org/packages/da/3f/9b0355080b81b15ba6a9ffcf1f5ea39e307a2778b2f2dc8694724e8abd5b/absl-py-0.7.1.tar.gz (99kB)
100% |████████████████████████████████| 102kB 1.3MB/s
Collecting grpcio>=1.6.3 (from tensorboard->-r requirements.txt (line 6))
Downloading https://files.pythonhosted.org/packages/f4/dc/5503d89e530988eb7a1aed337dcb456ef8150f7c06132233bd9e41ec0215/grpcio-1.19.0-cp36-cp36m-manylinux1_x86_64.whl (10.8MB)
100% |████████████████████████████████| 10.8MB 3.1MB/s
Collecting markdown>=2.6.8 (from tensorboard->-r requirements.txt (line 6))
Downloading https://files.pythonhosted.org/packages/f5/e4/d8c18f2555add57ff21bf25af36d827145896a07607486cc79a2aea641af/Markdown-3.1-py2.py3-none-any.whl (87kB)
100% |████████████████████████████████| 92kB 2.3MB/s
Requirement already satisfied: werkzeug>=0.11.15 in /usr/local/lib/python3.6/dist-packages/Werkzeug-0.15.2-py3.6.egg (from tensorboard->-r requirements.txt (line 6)) (0.15.2)
Requirement already satisfied: wheel>=0.26; python_version >= "3" in /usr/local/lib/python3.6/dist-packages (from tensorboard->-r requirements.txt (line 6)) (0.33.1)
Requirement already satisfied: protobuf>=3.6.0 in /usr/local/lib/python3.6/dist-packages/protobuf-3.7.1-py3.6.egg (from tensorboard->-r requirements.txt (line 6)) (3.7.1)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.6/dist-packages/cycler-0.10.0-py3.6.egg (from matplotlib==2.0.2->-r requirements.txt (line 8)) (0.10.0)
Requirement already satisfied: pyparsing!=2.0.0,!=2.0.4,!=2.1.2,!=2.1.6,>=1.5.6 in /usr/local/lib/python3.6/dist-packages/pyparsing-2.4.0-py3.6.egg (from matplotlib==2.0.2->-r requirements.txt (line 8)) (2.4.0)
Requirement already satisfied: python-dateutil in /usr/local/lib/python3.6/dist-packages/python_dateutil-2.8.0-py3.6.egg (from matplotlib==2.0.2->-r requirements.txt (line 8)) (2.8.0)
Requirement already satisfied: pytz in /usr/local/lib/python3.6/dist-packages/pytz-2019.1-py3.6.egg (from matplotlib==2.0.2->-r requirements.txt (line 8)) (2019.1)
Requirement already satisfied: Jinja2>=2.10 in /usr/local/lib/python3.6/dist-packages/Jinja2-2.10.1-py3.6.egg (from flask->-r requirements.txt (line 10)) (2.10.1)
Requirement already satisfied: click>=5.1 in /usr/local/lib/python3.6/dist-packages/Click-7.0-py3.6.egg (from flask->-r requirements.txt (line 10)) (7.0)
Requirement already satisfied: itsdangerous>=0.24 in /usr/local/lib/python3.6/dist-packages/itsdangerous-1.1.0-py3.6.egg (from flask->-r requirements.txt (line 10)) (1.1.0)
Requirement already satisfied: clldutils>=1.7.3 in /usr/local/lib/python3.6/dist-packages/clldutils-2.7.0-py3.6.egg (from segments->phonemizer==1.0.1->-r requirements.txt (line 14)) (2.7.0)
Requirement already satisfied: csvw in /usr/local/lib/python3.6/dist-packages/csvw-1.4.5-py3.6.egg (from segments->phonemizer==1.0.1->-r requirements.txt (line 14)) (1.4.5)
Requirement already satisfied: regex in /usr/local/lib/python3.6/dist-packages/regex-2019.4.10-py3.6-linux-x86_64.egg (from segments->phonemizer==1.0.1->-r requirements.txt (line 14)) (2019.4.10)
Requirement already satisfied: numba>=0.32 in /usr/local/lib/python3.6/dist-packages/numba-0.43.1-py3.6-linux-x86_64.egg (from resampy>=0.1.2->librosa==0.5.1->-r requirements.txt (line 4)) (0.43.1)
Requirement already satisfied: setuptools>=36 in /usr/local/lib/python3.6/dist-packages (from markdown>=2.6.8->tensorboard->-r requirements.txt (line 6)) (41.0.0)
Requirement already satisfied: MarkupSafe>=0.23 in /usr/local/lib/python3.6/dist-packages/MarkupSafe-1.1.1-py3.6-linux-x86_64.egg (from Jinja2>=2.10->flask->-r requirements.txt (line 10)) (1.1.1)
Requirement already satisfied: colorlog in /usr/local/lib/python3.6/dist-packages/colorlog-4.0.2-py3.6.egg (from clldutils>=1.7.3->segments->phonemizer==1.0.1->-r requirements.txt (line 14)) (4.0.2)
Requirement already satisfied: configparser>=3.5.0 in /usr/local/lib/python3.6/dist-packages/configparser-3.7.4-py3.6.egg (from clldutils>=1.7.3->segments->phonemizer==1.0.1->-r requirements.txt (line 14)) (3.7.4)
Requirement already satisfied: tabulate>=0.7.7 in /usr/local/lib/python3.6/dist-packages/tabulate-0.8.3-py3.6.egg (from clldutils>=1.7.3->segments->phonemizer==1.0.1->-r requirements.txt (line 14)) (0.8.3)
Requirement already satisfied: isodate in /usr/local/lib/python3.6/dist-packages/isodate-0.6.0-py3.6.egg (from csvw->segments->phonemizer==1.0.1->-r requirements.txt (line 14)) (0.6.0)
Requirement already satisfied: rfc3986 in /usr/local/lib/python3.6/dist-packages/rfc3986-1.2.0-py3.6.egg (from csvw->segments->phonemizer==1.0.1->-r requirements.txt (line 14)) (1.2.0)
Requirement already satisfied: uritemplate>=3.0.0 in /usr/local/lib/python3.6/dist-packages/uritemplate-3.0.0-py3.6.egg (from csvw->segments->phonemizer==1.0.1->-r requirements.txt (line 14)) (3.0.0)
Requirement already satisfied: llvmlite>=0.28.0dev0 in /usr/local/lib/python3.6/dist-packages/llvmlite-0.28.0-py3.6-linux-x86_64.egg (from numba>=0.32->resampy>=0.1.2->librosa==0.5.1->-r requirements.txt (line 4)) (0.28.0)
Building wheels for collected packages: lws, librosa, phonemizer, absl-py
Building wheel for lws (setup.py) ... done
Stored in directory: /root/.cache/pip/wheels/07/b1/1a/8dd583ce1048da5130e7cfef1b243c9a44be448f7a2fcf32d2
Building wheel for librosa (setup.py) ... done
Stored in directory: /root/.cache/pip/wheels/f6/21/55/9c17b30d30ef57e74b50c8824c2bb368d58ddabf9bf8e1fee0
Building wheel for phonemizer (setup.py) ... done
Stored in directory: /tmp/pip-ephem-wheel-cache-gmt3moo_/wheels/af/6a/52/4a29b8347b407c694e8b2f3d87e30e46cce0bd9ddf511818f5
Building wheel for absl-py (setup.py) ... done
Stored in directory: /root/.cache/pip/wheels/ee/98/38/46cbcc5a93cfea5492d19c38562691ddb23b940176c14f7b48
Successfully built lws librosa phonemizer absl-py
tts 0.0.1+4a5056b has requirement librosa==0.6.2, but you'll have librosa 0.5.1 which is incompatible.
Installing collected packages: numpy, lws, scipy, librosa, absl-py, grpcio, markdown, tensorboard
Found existing installation: numpy 1.15.4
Cannot remove entries from nonexistent file /srv/app/.eggs/easy-install.pth
root@65840041e280:/srv/app# python3 server/server.py -c server/conf.json
Traceback (most recent call last):
File "server/server.py", line 3, in <module>
from synthesizer import Synthesizer
File "/srv/app/server/synthesizer.py", line 4, in <module>
import numpy as np
ImportError: No module named 'numpy'
Ok, I am not experienced with Docker, I rebuild the the image with the model inside and server/conf.json
pointing to the correct location. After docker run -it --rm -p 5002:5002 mozilla-tts
I got:
> Loading model ...
| > model config: /srv/app/config.json
| > model file: /srv/app/checkpoint_272976.pth.tar
> Setting up Audio Processor...
| > bits:None
| > sample_rate:22050
| > num_mels:80
| > min_level_db:-100
| > frame_shift_ms:12.5
| > frame_length_ms:50
| > ref_level_db:20
| > num_freq:1025
| > power:1.5
| > preemphasis:0.98
| > griffin_lim_iters:60
| > signal_norm:True
| > symmetric_norm:False
| > mel_fmin:0
| > mel_fmax:None
| > max_norm:1.0
| > clip_norm:True
| > do_trim_silence:True
| > n_fft:2048
| > hop_length:275
| > win_length:1102
Traceback (most recent call last):
File "server/server.py", line 16, in <module>
config.model_config, config.use_cuda)
File "/srv/app/server/synthesizer.py", line 39, in load_model
self.model.load_state_dict(cp['model'])
File "/usr/local/lib/python3.6/dist-packages/torch-1.0.1.post2-py3.6-linux-x86_64.egg/torch/nn/modules/module.py", line 769, in load_state_dict
self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Tacotron:
Missing key(s) in state_dict: "encoder.cbhg.cbhg.conv1d_banks.0.conv1d.weight", "encoder.cbhg.cbhg.conv1d_banks.0.bn.weight", "encoder.cbhg.cbhg.conv1d_banks.0.bn.bias", "encoder.cbhg.cbhg.conv1d_banks.0.bn.running_mean", "encoder.cbhg.cbhg.conv1d_banks.0.bn.running_var", "encoder.cbhg.cbhg.conv1d_banks.1.conv1d.weight", "encoder.cbhg.cbhg.conv1d_banks.1.bn.weight", "encoder.cbhg.cbhg.conv1d_banks.1.bn.bias", "encoder.cbhg.cbhg.conv1d_banks.1.bn.running_mean", "encoder.cbhg.cbhg.conv1d_banks.1.bn.running_var", "encoder.cbhg.cbhg.conv1d_banks.2.conv1d.weight", "encoder.cbhg.cbhg.conv1d_banks.2.bn.weight", "encoder.cbhg.cbhg.conv1d_banks.2.bn.bias", "encoder.cbhg.cbhg.conv1d_banks.2.bn.running_mean", "encoder.cbhg.cbhg.conv1d_banks.2.bn.running_var", "encoder.cbhg.cbhg.conv1d_banks.3.conv1d.weight", "encoder.cbhg.cbhg.conv1d_banks.3.bn.weight", "encoder.cbhg.cbhg.conv1d_banks.3.bn.bias", "encoder.cbhg.cbhg.conv1d_banks.3.bn.running_mean", "encoder.cbhg.cbhg.conv1d_banks.3.bn.running_var", "encoder.cbhg.cbhg.conv1d_banks.4.conv1d.weight", "encoder.cbhg.cbhg.conv1d_banks.4.bn.weight", "encoder.cbhg.cbhg.conv1d_banks.4.bn.bias", "encoder.cbhg.cbhg.conv1d_banks.4.bn.running_mean", "encoder.cbhg.cbhg.conv1d_banks.4.bn.running_var", "encoder.cbhg.cbhg.conv1d_banks.5.conv1d.weight", "encoder.cbhg.cbhg.conv1d_banks.5.bn.weight", "encoder.cbhg.cbhg.conv1d_banks.5.bn.bias", "encoder.cbhg.cbhg.conv1d_banks.5.bn.running_mean", "encoder.cbhg.cbhg.conv1d_banks.5.bn.running_var", "encoder.cbhg.cbhg.conv1d_banks.6.conv1d.weight", "encoder.cbhg.cbhg.conv1d_banks.6.bn.weight", "encoder.cbhg.cbhg.conv1d_banks.6.bn.bias", "encoder.cbhg.cbhg.conv1d_banks.6.bn.running_mean", "encoder.cbhg.cbhg.conv1d_banks.6.bn.running_var", "encoder.cbhg.cbhg.conv1d_banks.7.conv1d.weight", "encoder.cbhg.cbhg.conv1d_banks.7.bn.weight", "encoder.cbhg.cbhg.conv1d_banks.7.bn.bias", "encoder.cbhg.cbhg.conv1d_banks.7.bn.running_mean", "encoder.cbhg.cbhg.conv1d_banks.7.bn.running_var", "encoder.cbhg.cbhg.conv1d_banks.8.conv1d.weight", "encoder.cbhg.cbhg.conv1d_banks.8.bn.weight", "encoder.cbhg.cbhg.conv1d_banks.8.bn.bias", "encoder.cbhg.cbhg.conv1d_banks.8.bn.running_mean", "encoder.cbhg.cbhg.conv1d_banks.8.bn.running_var", "encoder.cbhg.cbhg.conv1d_banks.9.conv1d.weight", "encoder.cbhg.cbhg.conv1d_banks.9.bn.weight", "encoder.cbhg.cbhg.conv1d_banks.9.bn.bias", "encoder.cbhg.cbhg.conv1d_banks.9.bn.running_mean", "encoder.cbhg.cbhg.conv1d_banks.9.bn.running_var", "encoder.cbhg.cbhg.conv1d_banks.10.conv1d.weight", "encoder.cbhg.cbhg.conv1d_banks.10.bn.weight", "encoder.cbhg.cbhg.conv1d_banks.10.bn.bias", "encoder.cbhg.cbhg.conv1d_banks.10.bn.running_mean", "encoder.cbhg.cbhg.conv1d_banks.10.bn.running_var", "encoder.cbhg.cbhg.conv1d_banks.11.conv1d.weight", "encoder.cbhg.cbhg.conv1d_banks.11.bn.weight", "encoder.cbhg.cbhg.conv1d_banks.11.bn.bias", "encoder.cbhg.cbhg.conv1d_banks.11.bn.running_mean", "encoder.cbhg.cbhg.conv1d_banks.11.bn.running_var", "encoder.cbhg.cbhg.conv1d_banks.12.conv1d.weight", "encoder.cbhg.cbhg.conv1d_banks.12.bn.weight", "encoder.cbhg.cbhg.conv1d_banks.12.bn.bias", "encoder.cbhg.cbhg.conv1d_banks.12.bn.running_mean", "encoder.cbhg.cbhg.conv1d_banks.12.bn.running_var", "encoder.cbhg.cbhg.conv1d_banks.13.conv1d.weight", "encoder.cbhg.cbhg.conv1d_banks.13.bn.weight", "encoder.cbhg.cbhg.conv1d_banks.13.bn.bias", "encoder.cbhg.cbhg.conv1d_banks.13.bn.running_mean", "encoder.cbhg.cbhg.conv1d_banks.13.bn.running_var", "encoder.cbhg.cbhg.conv1d_banks.14.conv1d.weight", "encoder.cbhg.cbhg.conv1d_banks.14.bn.weight", "encoder.cbhg.cbhg.conv1d_banks.14.bn.bias", "encoder.cbhg.cbhg.conv1d_banks.14.bn.running_mean", "encoder.cbhg.cbhg.conv1d_banks.14.bn.running_var", "encoder.cbhg.cbhg.conv1d_banks.15.conv1d.weight", "encoder.cbhg.cbhg.conv1d_banks.15.bn.weight", "encoder.cbhg.cbhg.conv1d_banks.15.bn.bias", "encoder.cbhg.cbhg.conv1d_banks.15.bn.running_mean", "encoder.cbhg.cbhg.conv1d_banks.15.bn.running_var", "encoder.cbhg.cbhg.conv1d_projections.0.conv1d.weight", "encoder.cbhg.cbhg.conv1d_projections.0.bn.weight", "encoder.cbhg.cbhg.conv1d_projections.0.bn.bias", "encoder.cbhg.cbhg.conv1d_projections.0.bn.running_mean", "encoder.cbhg.cbhg.conv1d_projections.0.bn.running_var", "encoder.cbhg.cbhg.conv1d_projections.1.conv1d.weight", "encoder.cbhg.cbhg.conv1d_projections.1.bn.weight", "encoder.cbhg.cbhg.conv1d_projections.1.bn.bias", "encoder.cbhg.cbhg.conv1d_projections.1.bn.running_mean", "encoder.cbhg.cbhg.conv1d_projections.1.bn.running_var", "encoder.cbhg.cbhg.highways.0.H.weight", "encoder.cbhg.cbhg.highways.0.H.bias", "encoder.cbhg.cbhg.highways.0.T.weight", "encoder.cbhg.cbhg.highways.0.T.bias", "encoder.cbhg.cbhg.highways.1.H.weight", "encoder.cbhg.cbhg.highways.1.H.bias", "encoder.cbhg.cbhg.highways.1.T.weight", "encoder.cbhg.cbhg.highways.1.T.bias", "encoder.cbhg.cbhg.highways.2.H.weight", "encoder.cbhg.cbhg.highways.2.H.bias", "encoder.cbhg.cbhg.highways.2.T.weight", "encoder.cbhg.cbhg.highways.2.T.bias", "encoder.cbhg.cbhg.highways.3.H.weight", "encoder.cbhg.cbhg.highways.3.H.bias", "encoder.cbhg.cbhg.highways.3.T.weight", "encoder.cbhg.cbhg.highways.3.T.bias", "encoder.cbhg.cbhg.gru.weight_ih_l0", "encoder.cbhg.cbhg.gru.weight_hh_l0", "encoder.cbhg.cbhg.gru.bias_ih_l0", "encoder.cbhg.cbhg.gru.bias_hh_l0", "encoder.cbhg.cbhg.gru.weight_ih_l0_reverse", "encoder.cbhg.cbhg.gru.weight_hh_l0_reverse", "encoder.cbhg.cbhg.gru.bias_ih_l0_reverse", "encoder.cbhg.cbhg.gru.bias_hh_l0_reverse", "decoder.attention_rnn.alignment_model.loc_conv.1.weight", "decoder.attention_rnn.alignment_model.loc_linear.weight", "decoder.attention_rnn.alignment_model.loc_linear.bias", "decoder.attention_rnn_init.weight", "decoder.memory_init.weight", "decoder.decoder_rnn_inits.weight", "postnet.cbhg.conv1d_banks.0.conv1d.weight", "postnet.cbhg.conv1d_banks.0.bn.weight", "postnet.cbhg.conv1d_banks.0.bn.bias", "postnet.cbhg.conv1d_banks.0.bn.running_mean", "postnet.cbhg.conv1d_banks.0.bn.running_var", "postnet.cbhg.conv1d_banks.1.conv1d.weight", "postnet.cbhg.conv1d_banks.1.bn.weight", "postnet.cbhg.conv1d_banks.1.bn.bias", "postnet.cbhg.conv1d_banks.1.bn.running_mean", "postnet.cbhg.conv1d_banks.1.bn.running_var", "postnet.cbhg.conv1d_banks.2.conv1d.weight", "postnet.cbhg.conv1d_banks.2.bn.weight", "postnet.cbhg.conv1d_banks.2.bn.bias", "postnet.cbhg.conv1d_banks.2.bn.running_mean", "postnet.cbhg.conv1d_banks.2.bn.running_var", "postnet.cbhg.conv1d_banks.3.conv1d.weight", "postnet.cbhg.conv1d_banks.3.bn.weight", "postnet.cbhg.conv1d_banks.3.bn.bias", "postnet.cbhg.conv1d_banks.3.bn.running_mean", "postnet.cbhg.conv1d_banks.3.bn.running_var", "postnet.cbhg.conv1d_banks.4.conv1d.weight", "postnet.cbhg.conv1d_banks.4.bn.weight", "postnet.cbhg.conv1d_banks.4.bn.bias", "postnet.cbhg.conv1d_banks.4.bn.running_mean", "postnet.cbhg.conv1d_banks.4.bn.running_var", "postnet.cbhg.conv1d_banks.5.conv1d.weight", "postnet.cbhg.conv1d_banks.5.bn.weight", "postnet.cbhg.conv1d_banks.5.bn.bias", "postnet.cbhg.conv1d_banks.5.bn.running_mean", "postnet.cbhg.conv1d_banks.5.bn.running_var", "postnet.cbhg.conv1d_banks.6.conv1d.weight", "postnet.cbhg.conv1d_banks.6.bn.weight", "postnet.cbhg.conv1d_banks.6.bn.bias", "postnet.cbhg.conv1d_banks.6.bn.running_mean", "postnet.cbhg.conv1d_banks.6.bn.running_var", "postnet.cbhg.conv1d_banks.7.conv1d.weight", "postnet.cbhg.conv1d_banks.7.bn.weight", "postnet.cbhg.conv1d_banks.7.bn.bias", "postnet.cbhg.conv1d_banks.7.bn.running_mean", "postnet.cbhg.conv1d_banks.7.bn.running_var", "postnet.cbhg.conv1d_projections.0.conv1d.weight", "postnet.cbhg.conv1d_projections.0.bn.weight", "postnet.cbhg.conv1d_projections.0.bn.bias", "postnet.cbhg.conv1d_projections.0.bn.running_mean", "postnet.cbhg.conv1d_projections.0.bn.running_var", "postnet.cbhg.conv1d_projections.1.conv1d.weight", "postnet.cbhg.conv1d_projections.1.bn.weight", "postnet.cbhg.conv1d_projections.1.bn.bias", "postnet.cbhg.conv1d_projections.1.bn.running_mean", "postnet.cbhg.conv1d_projections.1.bn.running_var", "postnet.cbhg.pre_highway.weight", "postnet.cbhg.highways.0.H.weight", "postnet.cbhg.highways.0.H.bias", "postnet.cbhg.highways.0.T.weight", "postnet.cbhg.highways.0.T.bias", "postnet.cbhg.highways.1.H.weight", "postnet.cbhg.highways.1.H.bias", "postnet.cbhg.highways.1.T.weight", "postnet.cbhg.highways.1.T.bias", "postnet.cbhg.highways.2.H.weight", "postnet.cbhg.highways.2.H.bias", "postnet.cbhg.highways.2.T.weight", "postnet.cbhg.highways.2.T.bias", "postnet.cbhg.highways.3.H.weight", "postnet.cbhg.highways.3.H.bias", "postnet.cbhg.highways.3.T.weight", "postnet.cbhg.highways.3.T.bias", "postnet.cbhg.gru.weight_ih_l0", "postnet.cbhg.gru.weight_hh_l0", "postnet.cbhg.gru.bias_ih_l0", "postnet.cbhg.gru.bias_hh_l0", "postnet.cbhg.gru.weight_ih_l0_reverse", "postnet.cbhg.gru.weight_hh_l0_reverse", "postnet.cbhg.gru.bias_ih_l0_reverse", "postnet.cbhg.gru.bias_hh_l0_reverse", "last_linear.0.weight", "last_linear.0.bias".
Unexpected key(s) in state_dict: "encoder.cbhg.conv1d_banks.0.conv1d.weight", "encoder.cbhg.conv1d_banks.0.bn.weight", "encoder.cbhg.conv1d_banks.0.bn.bias", "encoder.cbhg.conv1d_banks.0.bn.running_mean", "encoder.cbhg.conv1d_banks.0.bn.running_var", "encoder.cbhg.conv1d_banks.1.conv1d.weight", "encoder.cbhg.conv1d_banks.1.bn.weight", "encoder.cbhg.conv1d_banks.1.bn.bias", "encoder.cbhg.conv1d_banks.1.bn.running_mean", "encoder.cbhg.conv1d_banks.1.bn.running_var", "encoder.cbhg.conv1d_banks.2.conv1d.weight", "encoder.cbhg.conv1d_banks.2.bn.weight", "encoder.cbhg.conv1d_banks.2.bn.bias", "encoder.cbhg.conv1d_banks.2.bn.running_mean", "encoder.cbhg.conv1d_banks.2.bn.running_var", "encoder.cbhg.conv1d_banks.3.conv1d.weight", "encoder.cbhg.conv1d_banks.3.bn.weight", "encoder.cbhg.conv1d_banks.3.bn.bias", "encoder.cbhg.conv1d_banks.3.bn.running_mean", "encoder.cbhg.conv1d_banks.3.bn.running_var", "encoder.cbhg.conv1d_banks.4.conv1d.weight", "encoder.cbhg.conv1d_banks.4.bn.weight", "encoder.cbhg.conv1d_banks.4.bn.bias", "encoder.cbhg.conv1d_banks.4.bn.running_mean", "encoder.cbhg.conv1d_banks.4.bn.running_var", "encoder.cbhg.conv1d_banks.5.conv1d.weight", "encoder.cbhg.conv1d_banks.5.bn.weight", "encoder.cbhg.conv1d_banks.5.bn.bias", "encoder.cbhg.conv1d_banks.5.bn.running_mean", "encoder.cbhg.conv1d_banks.5.bn.running_var", "encoder.cbhg.conv1d_banks.6.conv1d.weight", "encoder.cbhg.conv1d_banks.6.bn.weight", "encoder.cbhg.conv1d_banks.6.bn.bias", "encoder.cbhg.conv1d_banks.6.bn.running_mean", "encoder.cbhg.conv1d_banks.6.bn.running_var", "encoder.cbhg.conv1d_banks.7.conv1d.weight", "encoder.cbhg.conv1d_banks.7.bn.weight", "encoder.cbhg.conv1d_banks.7.bn.bias", "encoder.cbhg.conv1d_banks.7.bn.running_mean", "encoder.cbhg.conv1d_banks.7.bn.running_var", "encoder.cbhg.conv1d_banks.8.conv1d.weight", "encoder.cbhg.conv1d_banks.8.bn.weight", "encoder.cbhg.conv1d_banks.8.bn.bias", "encoder.cbhg.conv1d_banks.8.bn.running_mean", "encoder.cbhg.conv1d_banks.8.bn.running_var", "encoder.cbhg.conv1d_banks.9.conv1d.weight", "encoder.cbhg.conv1d_banks.9.bn.weight", "encoder.cbhg.conv1d_banks.9.bn.bias", "encoder.cbhg.conv1d_banks.9.bn.running_mean", "encoder.cbhg.conv1d_banks.9.bn.running_var", "encoder.cbhg.conv1d_banks.10.conv1d.weight", "encoder.cbhg.conv1d_banks.10.bn.weight", "encoder.cbhg.conv1d_banks.10.bn.bias", "encoder.cbhg.conv1d_banks.10.bn.running_mean", "encoder.cbhg.conv1d_banks.10.bn.running_var", "encoder.cbhg.conv1d_banks.11.conv1d.weight", "encoder.cbhg.conv1d_banks.11.bn.weight", "encoder.cbhg.conv1d_banks.11.bn.bias", "encoder.cbhg.conv1d_banks.11.bn.running_mean", "encoder.cbhg.conv1d_banks.11.bn.running_var", "encoder.cbhg.conv1d_banks.12.conv1d.weight", "encoder.cbhg.conv1d_banks.12.bn.weight", "encoder.cbhg.conv1d_banks.12.bn.bias", "encoder.cbhg.conv1d_banks.12.bn.running_mean", "encoder.cbhg.conv1d_banks.12.bn.running_var", "encoder.cbhg.conv1d_banks.13.conv1d.weight", "encoder.cbhg.conv1d_banks.13.bn.weight", "encoder.cbhg.conv1d_banks.13.bn.bias", "encoder.cbhg.conv1d_banks.13.bn.running_mean", "encoder.cbhg.conv1d_banks.13.bn.running_var", "encoder.cbhg.conv1d_banks.14.conv1d.weight", "encoder.cbhg.conv1d_banks.14.bn.weight", "encoder.cbhg.conv1d_banks.14.bn.bias", "encoder.cbhg.conv1d_banks.14.bn.running_mean", "encoder.cbhg.conv1d_banks.14.bn.running_var", "encoder.cbhg.conv1d_banks.15.conv1d.weight", "encoder.cbhg.conv1d_banks.15.bn.weight", "encoder.cbhg.conv1d_banks.15.bn.bias", "encoder.cbhg.conv1d_banks.15.bn.running_mean", "encoder.cbhg.conv1d_banks.15.bn.running_var", "encoder.cbhg.conv1d_projections.0.conv1d.weight", "encoder.cbhg.conv1d_projections.0.bn.weight", "encoder.cbhg.conv1d_projections.0.bn.bias", "encoder.cbhg.conv1d_projections.0.bn.running_mean", "encoder.cbhg.conv1d_projections.0.bn.running_var", "encoder.cbhg.conv1d_projections.1.conv1d.weight", "encoder.cbhg.conv1d_projections.1.bn.weight", "encoder.cbhg.conv1d_projections.1.bn.bias", "encoder.cbhg.conv1d_projections.1.bn.running_mean", "encoder.cbhg.conv1d_projections.1.bn.running_var", "encoder.cbhg.pre_highway.weight", "encoder.cbhg.highways.0.H.weight", "encoder.cbhg.highways.0.H.bias", "encoder.cbhg.highways.0.T.weight", "encoder.cbhg.highways.0.T.bias", "encoder.cbhg.highways.1.H.weight", "encoder.cbhg.highways.1.H.bias", "encoder.cbhg.highways.1.T.weight", "encoder.cbhg.highways.1.T.bias", "encoder.cbhg.highways.2.H.weight", "encoder.cbhg.highways.2.H.bias", "encoder.cbhg.highways.2.T.weight", "encoder.cbhg.highways.2.T.bias", "encoder.cbhg.highways.3.H.weight", "encoder.cbhg.highways.3.H.bias", "encoder.cbhg.highways.3.T.weight", "encoder.cbhg.highways.3.T.bias", "encoder.cbhg.gru.weight_ih_l0", "encoder.cbhg.gru.weight_hh_l0", "encoder.cbhg.gru.bias_ih_l0", "encoder.cbhg.gru.bias_hh_l0", "encoder.cbhg.gru.weight_ih_l0_reverse", "encoder.cbhg.gru.weight_hh_l0_reverse", "encoder.cbhg.gru.bias_ih_l0_reverse", "encoder.cbhg.gru.bias_hh_l0_reverse", "decoder.stopnet.rnn.weight_ih", "decoder.stopnet.rnn.weight_hh", "decoder.stopnet.rnn.bias_ih", "decoder.stopnet.rnn.bias_hh", "postnet.conv1d_banks.0.conv1d.weight", "postnet.conv1d_banks.0.bn.weight", "postnet.conv1d_banks.0.bn.bias", "postnet.conv1d_banks.0.bn.running_mean", "postnet.conv1d_banks.0.bn.running_var", "postnet.conv1d_banks.1.conv1d.weight", "postnet.conv1d_banks.1.bn.weight", "postnet.conv1d_banks.1.bn.bias", "postnet.conv1d_banks.1.bn.running_mean", "postnet.conv1d_banks.1.bn.running_var", "postnet.conv1d_banks.2.conv1d.weight", "postnet.conv1d_banks.2.bn.weight", "postnet.conv1d_banks.2.bn.bias", "postnet.conv1d_banks.2.bn.running_mean", "postnet.conv1d_banks.2.bn.running_var", "postnet.conv1d_banks.3.conv1d.weight", "postnet.conv1d_banks.3.bn.weight", "postnet.conv1d_banks.3.bn.bias", "postnet.conv1d_banks.3.bn.running_mean", "postnet.conv1d_banks.3.bn.running_var", "postnet.conv1d_banks.4.conv1d.weight", "postnet.conv1d_banks.4.bn.weight", "postnet.conv1d_banks.4.bn.bias", "postnet.conv1d_banks.4.bn.running_mean", "postnet.conv1d_banks.4.bn.running_var", "postnet.conv1d_banks.5.conv1d.weight", "postnet.conv1d_banks.5.bn.weight", "postnet.conv1d_banks.5.bn.bias", "postnet.conv1d_banks.5.bn.running_mean", "postnet.conv1d_banks.5.bn.running_var", "postnet.conv1d_banks.6.conv1d.weight", "postnet.conv1d_banks.6.bn.weight", "postnet.conv1d_banks.6.bn.bias", "postnet.conv1d_banks.6.bn.running_mean", "postnet.conv1d_banks.6.bn.running_var", "postnet.conv1d_banks.7.conv1d.weight", "postnet.conv1d_banks.7.bn.weight", "postnet.conv1d_banks.7.bn.bias", "postnet.conv1d_banks.7.bn.running_mean", "postnet.conv1d_banks.7.bn.running_var", "postnet.conv1d_projections.0.conv1d.weight", "postnet.conv1d_projections.0.bn.weight", "postnet.conv1d_projections.0.bn.bias", "postnet.conv1d_projections.0.bn.running_mean", "postnet.conv1d_projections.0.bn.running_var", "postnet.conv1d_projections.1.conv1d.weight", "postnet.conv1d_projections.1.bn.weight", "postnet.conv1d_projections.1.bn.bias", "postnet.conv1d_projections.1.bn.running_mean", "postnet.conv1d_projections.1.bn.running_var", "postnet.pre_highway.weight", "postnet.highways.0.H.weight", "postnet.highways.0.H.bias", "postnet.highways.0.T.weight", "postnet.highways.0.T.bias", "postnet.highways.1.H.weight", "postnet.highways.1.H.bias", "postnet.highways.1.T.weight", "postnet.highways.1.T.bias", "postnet.highways.2.H.weight", "postnet.highways.2.H.bias", "postnet.highways.2.T.weight", "postnet.highways.2.T.bias", "postnet.highways.3.H.weight", "postnet.highways.3.H.bias", "postnet.highways.3.T.weight", "postnet.highways.3.T.bias", "postnet.gru.weight_ih_l0", "postnet.gru.weight_hh_l0", "postnet.gru.bias_ih_l0", "postnet.gru.bias_hh_l0", "postnet.gru.weight_ih_l0_reverse", "postnet.gru.weight_hh_l0_reverse", "postnet.gru.bias_ih_l0_reverse", "postnet.gru.bias_hh_l0_reverse", "last_linear.weight", "last_linear.bias".
size mismatch for embedding.weight: copying a param with shape torch.Size([149, 256]) from checkpoint, the shape in current model is torch.Size([129, 256]).
size mismatch for decoder.attention_rnn.alignment_model.query_layer.weight: copying a param with shape torch.Size([256, 256]) from checkpoint, the shape in current model is torch.Size([128, 256]).
size mismatch for decoder.attention_rnn.alignment_model.query_layer.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for decoder.attention_rnn.alignment_model.annot_layer.weight: copying a param with shape torch.Size([256, 256]) from checkpoint, the shape in current model is torch.Size([128, 256]).
size mismatch for decoder.attention_rnn.alignment_model.annot_layer.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for decoder.attention_rnn.alignment_model.v.weight: copying a param with shape torch.Size([1, 256]) from checkpoint, the shape in current model is torch.Size([1, 128]).
size mismatch for decoder.stopnet.linear.weight: copying a param with shape torch.Size([1, 400]) from checkpoint, the shape in current model is torch.Size([1, 656]).
Tried on branch dev-tacotron2
$ docker run -it --rm -p 5002:5002 mozilla-tts
> Loading model ...
| > model config: /srv/app/config.json
| > model file: /srv/app/checkpoint_272976.pth.tar
> Setting up Audio Processor...
| > bits:None
| > sample_rate:22050
| > num_mels:80
| > min_level_db:-100
| > frame_shift_ms:12.5
| > frame_length_ms:50
| > ref_level_db:20
| > num_freq:1025
| > power:1.5
| > preemphasis:0.98
| > griffin_lim_iters:60
| > signal_norm:True
| > symmetric_norm:False
| > mel_fmin:0.0
| > mel_fmax:8000.0
| > max_norm:1.0
| > clip_norm:True
| > do_trim_silence:True
| > n_fft:2048
| > hop_length:275
| > win_length:1102
Traceback (most recent call last):
File "server/server.py", line 16, in <module>
config.model_config, config.use_cuda)
File "/srv/app/server/synthesizer.py", line 26, in load_model
self.model = Tacotron(config.embedding_size, self.ap.num_freq, self.ap.num_mels, config.r)
AttributeError: 'AttrDict' object has no attribute 'embedding_size'
The first part is your question is not really TTS problem but @tomzx might help you who wrote the docker file.
The second part is that the model checkpoint does not match with the code you use to load it. You need to use the right commit version of TTS to run the corresponding model which are given in the checkpoint table.
Ok, thank you, I tried commit db7f3d3
with corresponding model and config from the table - downloaded from Google Drive, put to the root of the cloned repository, docker build -t mozilla-tts .
, docker run -it --rm -p 5002:5002 mozilla-tts
and got:
> Loading model ...
| > model config: /srv/app/config.json
| > model file: /srv/app/best_model.pth.tar
> Setting up Audio Processor...
| > fft size: 2048, hop length: 275, win length: 1102
| > Audio Processor attributes.
| > bits:None
| > sample_rate:22050
| > num_mels:80
| > min_level_db:-100
| > frame_shift_ms:12.5
| > frame_length_ms:50
| > ref_level_db:20
| > num_freq:1025
| > power:1.5
| > preemphasis:0.98
| > griffin_lim_iters:60
| > signal_norm:True
| > symmetric_norm:False
| > mel_fmin:0
| > mel_fmax:None
| > max_norm:1.0
| > clip_norm:True
| > do_trim_silence:True
| > n_fft:2048
| > hop_length:275
| > win_length:1102
| > Number of characters : 256
Traceback (most recent call last):
File "server/server.py", line 16, in <module>
config.model_config, config.use_cuda)
File "/srv/app/server/synthesizer.py", line 34, in load_model
self.model.load_state_dict(cp['model'])
File "/usr/local/lib/python3.6/dist-packages/torch-1.0.1.post2-py3.6-linux-x86_64.egg/torch/nn/modules/module.py", line 769, in load_state_dict
self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Tacotron:
size mismatch for embedding.weight: copying a param with shape torch.Size([61, 256]) from checkpoint, the shape in current model is torch.Size([256, 1025]).
size mismatch for encoder.prenet.layers.0.weight: copying a param with shape torch.Size([256, 256]) from checkpoint, the shape in current model is torch.Size([256, 1025]).
size mismatch for decoder.prenet.layers.0.weight: copying a param with shape torch.Size([256, 400]) from checkpoint, the shape in current model is torch.Size([256, 25]).
size mismatch for decoder.proj_to_mel.weight: copying a param with shape torch.Size([160, 256]) from checkpoint, the shape in current model is torch.Size([25, 256]).
size mismatch for decoder.proj_to_mel.bias: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([25]).
size mismatch for decoder.memory_init.weight: copying a param with shape torch.Size([1, 400]) from checkpoint, the shape in current model is torch.Size([1, 25]).
size mismatch for decoder.stopnet.linear.weight: copying a param with shape torch.Size([1, 416]) from checkpoint, the shape in current model is torch.Size([1, 281]).
size mismatch for postnet.cbhg.conv1d_banks.0.conv1d.weight: copying a param with shape torch.Size([128, 80, 1]) from checkpoint, the shape in current model is torch.Size([128, 5, 1]).
size mismatch for postnet.cbhg.conv1d_banks.1.conv1d.weight: copying a param with shape torch.Size([128, 80, 2]) from checkpoint, the shape in current model is torch.Size([128, 5, 2]).
size mismatch for postnet.cbhg.conv1d_banks.2.conv1d.weight: copying a param with shape torch.Size([128, 80, 3]) from checkpoint, the shape in current model is torch.Size([128, 5, 3]).
size mismatch for postnet.cbhg.conv1d_banks.3.conv1d.weight: copying a param with shape torch.Size([128, 80, 4]) from checkpoint, the shape in current model is torch.Size([128, 5, 4]).
size mismatch for postnet.cbhg.conv1d_banks.4.conv1d.weight: copying a param with shape torch.Size([128, 80, 5]) from checkpoint, the shape in current model is torch.Size([128, 5, 5]).
size mismatch for postnet.cbhg.conv1d_banks.5.conv1d.weight: copying a param with shape torch.Size([128, 80, 6]) from checkpoint, the shape in current model is torch.Size([128, 5, 6]).
size mismatch for postnet.cbhg.conv1d_banks.6.conv1d.weight: copying a param with shape torch.Size([128, 80, 7]) from checkpoint, the shape in current model is torch.Size([128, 5, 7]).
size mismatch for postnet.cbhg.conv1d_banks.7.conv1d.weight: copying a param with shape torch.Size([128, 80, 8]) from checkpoint, the shape in current model is torch.Size([128, 5, 8]).
size mismatch for postnet.cbhg.conv1d_projections.1.conv1d.weight: copying a param with shape torch.Size([80, 256, 3]) from checkpoint, the shape in current model is torch.Size([5, 256, 3]).
size mismatch for postnet.cbhg.conv1d_projections.1.bn.weight: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([5]).
size mismatch for postnet.cbhg.conv1d_projections.1.bn.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([5]).
size mismatch for postnet.cbhg.conv1d_projections.1.bn.running_mean: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([5]).
size mismatch for postnet.cbhg.conv1d_projections.1.bn.running_var: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([5]).
size mismatch for postnet.cbhg.pre_highway.weight: copying a param with shape torch.Size([128, 80]) from checkpoint, the shape in current model is torch.Size([128, 5]).
size mismatch for last_linear.0.weight: copying a param with shape torch.Size([1025, 256]) from checkpoint, the shape in current model is torch.Size([80, 256]).
size mismatch for last_linear.0.bias: copying a param with shape torch.Size([1025]) from checkpoint, the shape in current model is torch.Size([80]).
I tried to try commit 2810d57
from the same table but got error: pathspec '2810d57' did not match any file(s) known to git
Hm, also I tried db7f3d3
without Docker - on a fresh Ubuntu 18.10, also the same model and config from table (from the same row):
$ python3 server/server.py -c server/conf.json
/home/vitaly_zdanevich/.local/lib/python3.6/site-packages/numba/errors.py:105: UserWarning: Insufficiently recent colorama version found. Numba requires colorama >= 0.3.9
warnings.warn(msg)
> Loading model ...
| > model config: /home/vitaly_zdanevich/TTS/config.json
| > model file: /home/vitaly_zdanevich/TTS/best_model.pth.tar
> Setting up Audio Processor...
| > fft size: 2048, hop length: 275, win length: 1102
| > Audio Processor attributes.
| > bits:None
| > sample_rate:22050
| > num_mels:80
| > min_level_db:-100
| > frame_shift_ms:12.5
| > frame_length_ms:50
| > ref_level_db:20
| > num_freq:1025
| > power:1.5
| > preemphasis:0.98
| > griffin_lim_iters:60
| > signal_norm:True
| > symmetric_norm:False
| > mel_fmin:0
| > mel_fmax:None
| > max_norm:1.0
| > clip_norm:True
| > do_trim_silence:True
| > n_fft:2048
| > hop_length:275
| > win_length:1102
| > Number of characters : 256
Traceback (most recent call last):
File "server/server.py", line 16, in <module>
config.model_config, config.use_cuda)
File "/home/vitaly_zdanevich/TTS/server/synthesizer.py", line 34, in load_model
self.model.load_state_dict(cp['model'])
File "/home/vitaly_zdanevich/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 769, in load_state_dict
self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Tacotron:
size mismatch for embedding.weight: copying a param with shape torch.Size([61, 256]) from checkpoint, the shape in current model is torch.Size([256, 1025]).
size mismatch for encoder.prenet.layers.0.weight: copying a param with shape torch.Size([256, 256]) from checkpoint, the shape in current model is torch.Size([256, 1025]).
size mismatch for decoder.prenet.layers.0.weight: copying a param with shape torch.Size([256, 400]) from checkpoint, the shape in current model is torch.Size([256, 10]).
size mismatch for decoder.proj_to_mel.weight: copying a param with shape torch.Size([160, 256]) from checkpoint, the shape in current model is torch.Size([10, 256]).
size mismatch for decoder.proj_to_mel.bias: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([10]).
size mismatch for decoder.memory_init.weight: copying a param with shape torch.Size([1, 400]) from checkpoint, the shape in current model is torch.Size([1, 10]).
size mismatch for decoder.stopnet.linear.weight: copying a param with shape torch.Size([1, 416]) from checkpoint, the shape in current model is torch.Size([1, 266]).
size mismatch for postnet.cbhg.conv1d_banks.0.conv1d.weight: copying a param with shape torch.Size([128, 80, 1]) from checkpoint, the shape in current model is torch.Size([128, 2, 1]).
size mismatch for postnet.cbhg.conv1d_banks.1.conv1d.weight: copying a param with shape torch.Size([128, 80, 2]) from checkpoint, the shape in current model is torch.Size([128, 2, 2]).
size mismatch for postnet.cbhg.conv1d_banks.2.conv1d.weight: copying a param with shape torch.Size([128, 80, 3]) from checkpoint, the shape in current model is torch.Size([128, 2, 3]).
size mismatch for postnet.cbhg.conv1d_banks.3.conv1d.weight: copying a param with shape torch.Size([128, 80, 4]) from checkpoint, the shape in current model is torch.Size([128, 2, 4]).
size mismatch for postnet.cbhg.conv1d_banks.4.conv1d.weight: copying a param with shape torch.Size([128, 80, 5]) from checkpoint, the shape in current model is torch.Size([128, 2, 5]).
size mismatch for postnet.cbhg.conv1d_banks.5.conv1d.weight: copying a param with shape torch.Size([128, 80, 6]) from checkpoint, the shape in current model is torch.Size([128, 2, 6]).
size mismatch for postnet.cbhg.conv1d_banks.6.conv1d.weight: copying a param with shape torch.Size([128, 80, 7]) from checkpoint, the shape in current model is torch.Size([128, 2, 7]).
size mismatch for postnet.cbhg.conv1d_banks.7.conv1d.weight: copying a param with shape torch.Size([128, 80, 8]) from checkpoint, the shape in current model is torch.Size([128, 2, 8]).
size mismatch for postnet.cbhg.conv1d_projections.1.conv1d.weight: copying a param with shape torch.Size([80, 256, 3]) from checkpoint, the shape in current model is torch.Size([2, 256, 3]).
size mismatch for postnet.cbhg.conv1d_projections.1.bn.weight: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([2]).
size mismatch for postnet.cbhg.conv1d_projections.1.bn.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([2]).
size mismatch for postnet.cbhg.conv1d_projections.1.bn.running_mean: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([2]).
size mismatch for postnet.cbhg.conv1d_projections.1.bn.running_var: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([2]).
size mismatch for postnet.cbhg.pre_highway.weight: copying a param with shape torch.Size([128, 80]) from checkpoint, the shape in current model is torch.Size([128, 2]).
size mismatch for last_linear.0.weight: copying a param with shape torch.Size([1025, 256]) from checkpoint, the shape in current model is torch.Size([80, 256]).
size mismatch for last_linear.0.bias: copying a param with shape torch.Size([1025]) from checkpoint, the shape in current model is torch.Size([80]).
I'm also trying to use the server directly (ie not via Docker), with the downloaded model based on the table with the repo at db7f3d3 and I get the same issue (see output below).
However I do manage to get the Benchmark.ipynb notebook working successfully and despite then using the same (working) config.json file with server.py there is still a problem, so I suspect it may be some kind of issue where the commit had the server folder in a state where it wouldn't work with that model and perhaps this just wasn't noticed?
Will see if I can figure out which settings may be causing it.
python server/server.py -c server/conf.json
/home/neil/.conda/envs/tts/lib/python3.6/site-packages/scikit_learn-0.20.0-py3.6-linux-x86_64.egg/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
> Loading model ...
| > model config: /home/neil/main/Projects/TTS-models/queue-February-16-2019_03+16AM-90f0cd6/config.json
| > model file: /home/neil/main/Projects/TTS-models/queue-February-16-2019_03+16AM-90f0cd6/best_model.pth.tar
> Setting up Audio Processor...
| > fft size: 2048, hop length: 275, win length: 1102
| > Audio Processor attributes.
| > bits:None
| > sample_rate:22050
| > num_mels:80
| > min_level_db:-100
| > frame_shift_ms:12.5
| > frame_length_ms:50
| > ref_level_db:20
| > num_freq:1025
| > power:1.5
| > preemphasis:0.98
| > griffin_lim_iters:60
| > signal_norm:True
| > symmetric_norm:False
| > mel_fmin:0
| > mel_fmax:None
| > max_norm:1.0
| > clip_norm:True
| > do_trim_silence:True
| > n_fft:2048
| > hop_length:275
| > win_length:1102
| > Number of characters : 256
Traceback (most recent call last):
File "server/server.py", line 16, in <module>
config.model_config, config.use_cuda)
File "/home/neil/main/Projects/TTS/server/synthesizer.py", line 34, in load_model
self.model.load_state_dict(cp['model'])
File "/home/neil/.conda/envs/tts/lib/python3.6/site-packages/torch/nn/modules/module.py", line 769, in load_state_dict
self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Tacotron:
size mismatch for embedding.weight: copying a param with shape torch.Size([61, 256]) from checkpoint, the shape in current model is torch.Size([256, 1025]).
size mismatch for encoder.prenet.layers.0.weight: copying a param with shape torch.Size([256, 256]) from checkpoint, the shape in current model is torch.Size([256, 1025]).
size mismatch for decoder.prenet.layers.0.weight: copying a param with shape torch.Size([256, 400]) from checkpoint, the shape in current model is torch.Size([256, 10]).
size mismatch for decoder.proj_to_mel.weight: copying a param with shape torch.Size([160, 256]) from checkpoint, the shape in current model is torch.Size([10, 256]).
size mismatch for decoder.proj_to_mel.bias: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([10]).
size mismatch for decoder.memory_init.weight: copying a param with shape torch.Size([1, 400]) from checkpoint, the shape in current model is torch.Size([1, 10]).
size mismatch for decoder.stopnet.linear.weight: copying a param with shape torch.Size([1, 416]) from checkpoint, the shape in current model is torch.Size([1, 266]).
size mismatch for postnet.cbhg.conv1d_banks.0.conv1d.weight: copying a param with shape torch.Size([128, 80, 1]) from checkpoint, the shape in current model is torch.Size([128, 2, 1]).
size mismatch for postnet.cbhg.conv1d_banks.1.conv1d.weight: copying a param with shape torch.Size([128, 80, 2]) from checkpoint, the shape in current model is torch.Size([128, 2, 2]).
size mismatch for postnet.cbhg.conv1d_banks.2.conv1d.weight: copying a param with shape torch.Size([128, 80, 3]) from checkpoint, the shape in current model is torch.Size([128, 2, 3]).
size mismatch for postnet.cbhg.conv1d_banks.3.conv1d.weight: copying a param with shape torch.Size([128, 80, 4]) from checkpoint, the shape in current model is torch.Size([128, 2, 4]).
size mismatch for postnet.cbhg.conv1d_banks.4.conv1d.weight: copying a param with shape torch.Size([128, 80, 5]) from checkpoint, the shape in current model is torch.Size([128, 2, 5]).
size mismatch for postnet.cbhg.conv1d_banks.5.conv1d.weight: copying a param with shape torch.Size([128, 80, 6]) from checkpoint, the shape in current model is torch.Size([128, 2, 6]).
size mismatch for postnet.cbhg.conv1d_banks.6.conv1d.weight: copying a param with shape torch.Size([128, 80, 7]) from checkpoint, the shape in current model is torch.Size([128, 2, 7]).
size mismatch for postnet.cbhg.conv1d_banks.7.conv1d.weight: copying a param with shape torch.Size([128, 80, 8]) from checkpoint, the shape in current model is torch.Size([128, 2, 8]).
size mismatch for postnet.cbhg.conv1d_projections.1.conv1d.weight: copying a param with shape torch.Size([80, 256, 3]) from checkpoint, the shape in current model is torch.Size([2, 256, 3]).
size mismatch for postnet.cbhg.conv1d_projections.1.bn.weight: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([2]).
size mismatch for postnet.cbhg.conv1d_projections.1.bn.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([2]).
size mismatch for postnet.cbhg.conv1d_projections.1.bn.running_mean: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([2]).
size mismatch for postnet.cbhg.conv1d_projections.1.bn.running_var: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([2]).
size mismatch for postnet.cbhg.pre_highway.weight: copying a param with shape torch.Size([128, 80]) from checkpoint, the shape in current model is torch.Size([128, 2]).
size mismatch for last_linear.0.weight: copying a param with shape torch.Size([1025, 256]) from checkpoint, the shape in current model is torch.Size([80, 256]).
size mismatch for last_linear.0.bias: copying a param with shape torch.Size([1025]) from checkpoint, the shape in current model is torch.Size([80]).
EDIT: actually there may be more to do... this doesn't cause it to crash but the output is jibberish :slightly_smiling_face:
It's the line in synthesizer.py which isn't updated: https://github.com/mozilla/TTS/blob/db7f3d36e7768f9179d42a8f19b88c2c736d87eb/server/synthesizer.py#L26
If you replace L26 with:
num_chars = 61
self.model = Tacotron(num_chars, config.embedding_size, config.audio['num_freq'], config.audio['num_mels'], config.r, attn_windowing=False)
It should then work (NB: the 61 is hard-coded for simplicity, but you can refer to the line in Benchmark.ipynb for how it should work, but then you need to bring in other variables too)
I tried the fix above from @nmstoker - looks like finally server is started but not accessible
$ python3 server/server.py -c server/conf.json
/home/vitaly_zdanevich/.local/lib/python3.6/site-packages/numba/errors.py:105: UserWarning: Insufficiently recent colorama version found. Numba requires colorama >= 0.3.9
warnings.warn(msg)
> Loading model ...
| > model config: /home/vitaly_zdanevich/TTS/config.json
| > model file: /home/vitaly_zdanevich/TTS/best_model.pth.tar
> Setting up Audio Processor...
| > fft size: 2048, hop length: 275, win length: 1102
| > Audio Processor attributes.
| > bits:None
| > sample_rate:22050
| > num_mels:80
| > min_level_db:-100
| > frame_shift_ms:12.5
| > frame_length_ms:50
| > ref_level_db:20
| > num_freq:1025
| > power:1.5
| > preemphasis:0.98
| > griffin_lim_iters:60
| > signal_norm:True
| > symmetric_norm:False
| > mel_fmin:0
| > mel_fmax:None
| > max_norm:1.0
| > clip_norm:True
| > do_trim_silence:True
| > n_fft:2048
| > hop_length:275
| > win_length:1102
| > Number of characters : 61
* Serving Flask app "server" (lazy loading)
* Environment: production
WARNING: Do not use the development server in a production environment.
Use a production WSGI server instead.
* Debug mode: on
INFO:werkzeug: * Running on http://0.0.0.0:8000/ (Press CTRL+C to quit)
INFO:werkzeug: * Restarting with stat
/home/vitaly_zdanevich/.local/lib/python3.6/site-packages/numba/errors.py:105: UserWarning: Insufficiently recent colorama version found. Numba requires colorama >= 0.3.9
warnings.warn(msg)
> Loading model ...
| > model config: /home/vitaly_zdanevich/TTS/config.json
| > model file: /home/vitaly_zdanevich/TTS/best_model.pth.tar
> Setting up Audio Processor...
| > fft size: 2048, hop length: 275, win length: 1102
| > Audio Processor attributes.
| > bits:None
| > sample_rate:22050
| > num_mels:80
| > min_level_db:-100
| > frame_shift_ms:12.5
| > frame_length_ms:50
| > ref_level_db:20
| > num_freq:1025
| > power:1.5
| > preemphasis:0.98
| > griffin_lim_iters:60
| > signal_norm:True
| > symmetric_norm:False
| > mel_fmin:0
| > mel_fmax:None
| > max_norm:1.0
| > clip_norm:True
| > do_trim_silence:True
| > n_fft:2048
| > hop_length:275
| > win_length:1102
| > Number of characters : 61
WARNING:werkzeug: * Debugger is active!
INFO:werkzeug: * Debugger PIN: 220-380-468
To fetch page I tried browser and curl
:
$ curl http://35.230.162.189:8000
curl: (7) Failed to connect to 35.230.162.189 port 8000: Connection timed out
check the default port in config file
Yes before run in my config I changed port from 5002
to 8000
- like in python3 -m http.server
.
Sorry my fail, curl localhost:8000
returns some HTML...
In my browser I see the page from your HTTP server, I tried string hello
and got some strange sound as output, I tried another string and got INTERNAL SERVER ERROR:
INFO:werkzeug: * Debugger PIN: 220-380-468
INFO:werkzeug:127.0.0.1 - - [16/Apr/2019 17:00:13] "GET / HTTP/1.1" 200 -
INFO:werkzeug:46.216.60.228 - - [16/Apr/2019 17:01:31] "GET / HTTP/1.1" 200 -
INFO:werkzeug:46.216.60.228 - - [16/Apr/2019 17:01:33] "GET /favicon.ico HTTP/1.1" 404 -
> Model input: hello
hello.
INFO:werkzeug:46.216.60.228 - - [16/Apr/2019 17:01:42] "GET /api/tts?text=hello HTTP/1.1" 200 -
> Model input: life is just a simple game
life is just a simple game.
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCTensorMath.cu line=238 error=59 : device-side assert triggered
INFO:werkzeug:46.216.60.228 - - [16/Apr/2019 17:02:13] "GET /api/tts?text=life%20is%20just%20a%20simple%20game HTTP/1.1" 500 -
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/Flask-1.0.2-py3.6.egg/flask/app.py", line 2309, in __call__
return self.wsgi_app(environ, start_response)
File "/usr/local/lib/python3.6/dist-packages/Flask-1.0.2-py3.6.egg/flask/app.py", line 2295, in wsgi_app
response = self.handle_exception(e)
File "/usr/local/lib/python3.6/dist-packages/Flask-1.0.2-py3.6.egg/flask/app.py", line 1741, in handle_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python3.6/dist-packages/Flask-1.0.2-py3.6.egg/flask/_compat.py", line 35, in reraise
raise value
File "/usr/local/lib/python3.6/dist-packages/Flask-1.0.2-py3.6.egg/flask/app.py", line 2292, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.6/dist-packages/Flask-1.0.2-py3.6.egg/flask/app.py", line 1815, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.6/dist-packages/Flask-1.0.2-py3.6.egg/flask/app.py", line 1718, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python3.6/dist-packages/Flask-1.0.2-py3.6.egg/flask/_compat.py", line 35, in reraise
raise value
File "/usr/local/lib/python3.6/dist-packages/Flask-1.0.2-py3.6.egg/flask/app.py", line 1813, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.6/dist-packages/Flask-1.0.2-py3.6.egg/flask/app.py", line 1799, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/home/vitaly_zdanevich/TTS/server/server.py", line 28, in tts
data = synthesizer.tts(text)
File "/home/vitaly_zdanevich/TTS/server/synthesizer.py", line 63, in tts
chars_var)
File "/home/vitaly_zdanevich/TTS/models/tacotron.py", line 37, in forward
encoder_outputs = self.encoder(inputs)
File "/home/vitaly_zdanevich/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/vitaly_zdanevich/TTS/layers/tacotron.py", line 275, in forward
return self.cbhg(inputs)
File "/home/vitaly_zdanevich/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/vitaly_zdanevich/TTS/layers/tacotron.py", line 254, in forward
return self.cbhg(x)
File "/home/vitaly_zdanevich/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/vitaly_zdanevich/TTS/layers/tacotron.py", line 218, in forward
x = torch.cat(outs, dim=1)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/generic/THCTensorMath.cu:238
That sounds like might be the jibberish I mentioned in my edit. I'm not in front of a computer at the moment, but I think the points @erogol made in #154 may help.
Quote from
README.md
:In the server README:
But here is not mentioned where to put model file.
I already ran
docker build -t mozilla-tts .
right aftergit clone
- so what should be my further actions? I need to put downloaded model to some place inside Docker virtual machine?P.S: Is it really already possible to get this quality of TTS?