mozilla / TTS

:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
Mozilla Public License 2.0
9.37k stars 1.25k forks source link

README is not clear about how to use docker image for TTS (not training) #150

Closed vitaly-zdanevich closed 5 years ago

vitaly-zdanevich commented 5 years ago

Quote from README.md:

Make sure you follow the instructions in the server README before you build your image so that the server can find the model within the image.

In the server README:

Download one of the models given on the main page. Click here for the lastest model.

But here is not mentioned where to put model file.

I already ran docker build -t mozilla-tts . right after git clone - so what should be my further actions? I need to put downloaded model to some place inside Docker virtual machine?

P.S: Is it really already possible to get this quality of TTS?

vitaly-zdanevich commented 5 years ago

...as I understand for highest quality I need to use WaveRNN but I see that TTS model is coming soon.

vitaly-zdanevich commented 5 years ago

Ok I downloaded model locally, set path to it in server/conf.json. Point number 4 from server/README.md:

Run the server python server/server.py -c server/conf.json. (Requires Flask)

Inside running Docker image I tried: python server/server.py -c server/conf.json Faced with: python: command not found Ok, I tried: python3 server/server.py -c server/conf.json and got:

Traceback (most recent call last):
  File "server/server.py", line 3, in <module>
    from synthesizer import Synthesizer
  File "/srv/app/server/synthesizer.py", line 4, in <module>
    import numpy as np
ImportError: No module named 'numpy'
vitaly-zdanevich commented 5 years ago

Ok my next steps was:

$ pip3 install -r requirements.txt 
Collecting git+git://github.com/bootphon/phonemizer@master (from -r requirements.txt (line 14))
  Cloning git://github.com/bootphon/phonemizer (to revision master) to /tmp/pip-req-build-qw833s40
Requirement already satisfied (use --upgrade to upgrade): phonemizer==1.0.1 from git+git://github.com/bootphon/phonemizer@master in /usr/local/lib/python3.6/dist-packages/phonemizer-1.0.1-py3.6.egg (from -r requirements.txt (line 14))
Collecting numpy==1.14.3 (from -r requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/71/90/ca61e203e0080a8cef7ac21eca199829fa8d997f7c4da3e985b49d0a107d/numpy-1.14.3-cp36-cp36m-manylinux1_x86_64.whl (12.2MB)
    100% |████████████████████████████████| 12.2MB 1.3MB/s 
Collecting lws (from -r requirements.txt (line 2))
  Downloading https://files.pythonhosted.org/packages/3a/c7/856af2e1202e7a4c5102406196aa661edb402256e7ce2334be0c0d8afa2e/lws-1.2.tar.gz (133kB)
    100% |████████████████████████████████| 143kB 4.9MB/s 
Requirement already satisfied: torch>=0.4.1 in /usr/local/lib/python3.6/dist-packages/torch-1.0.1.post2-py3.6-linux-x86_64.egg (from -r requirements.txt (line 3)) (1.0.1.post2)
Collecting librosa==0.5.1 (from -r requirements.txt (line 4))
  Downloading https://files.pythonhosted.org/packages/51/c2/d8d8498252a2430ec9b90481754aca287c0ecc237a8feb331fa3b8933575/librosa-0.5.1.tar.gz (1.5MB)
    100% |████████████████████████████████| 1.5MB 4.0MB/s 
Requirement already satisfied: Unidecode==0.4.20 in /usr/local/lib/python3.6/dist-packages/Unidecode-0.4.20-py3.6.egg (from -r requirements.txt (line 5)) (0.4.20)
Collecting tensorboard (from -r requirements.txt (line 6))
  Downloading https://files.pythonhosted.org/packages/0f/39/bdd75b08a6fba41f098b6cb091b9e8c7a80e1b4d679a581a0ccd17b10373/tensorboard-1.13.1-py3-none-any.whl (3.2MB)
    100% |████████████████████████████████| 3.2MB 4.2MB/s 
Requirement already satisfied: tensorboardX in /usr/local/lib/python3.6/dist-packages/tensorboardX-1.6-py3.6.egg (from -r requirements.txt (line 7)) (1.6)
Requirement already satisfied: matplotlib==2.0.2 in /usr/local/lib/python3.6/dist-packages/matplotlib-2.0.2-py3.6-linux-x86_64.egg (from -r requirements.txt (line 8)) (2.0.2)
Requirement already satisfied: Pillow in /usr/local/lib/python3.6/dist-packages/Pillow-6.0.0-py3.6-linux-x86_64.egg (from -r requirements.txt (line 9)) (6.0.0)
Requirement already satisfied: flask in /usr/local/lib/python3.6/dist-packages/Flask-1.0.2-py3.6.egg (from -r requirements.txt (line 10)) (1.0.2)
Collecting scipy==0.19.0 (from -r requirements.txt (line 11))
  Downloading https://files.pythonhosted.org/packages/d0/7b/415fd5bb215f28b423d32dc98126f700ebe7f1efa53e65377ed6ed55df99/scipy-0.19.0-cp36-cp36m-manylinux1_x86_64.whl (48.2MB)
    100% |████████████████████████████████| 48.2MB 308kB/s 
Requirement already satisfied: tqdm in /usr/local/lib/python3.6/dist-packages/tqdm-4.31.1-py3.6.egg (from -r requirements.txt (line 13)) (4.31.1)
Requirement already satisfied: joblib in /usr/local/lib/python3.6/dist-packages/joblib-0.13.2-py3.6.egg (from phonemizer==1.0.1->-r requirements.txt (line 14)) (0.13.2)
Requirement already satisfied: segments in /usr/local/lib/python3.6/dist-packages/segments-2.0.1-py3.6.egg (from phonemizer==1.0.1->-r requirements.txt (line 14)) (2.0.1)
Requirement already satisfied: attrs>=18.1 in /usr/local/lib/python3.6/dist-packages/attrs-19.1.0-py3.6.egg (from phonemizer==1.0.1->-r requirements.txt (line 14)) (19.1.0)
Requirement already satisfied: audioread>=2.0.0 in /usr/local/lib/python3.6/dist-packages/audioread-2.1.6-py3.6.egg (from librosa==0.5.1->-r requirements.txt (line 4)) (2.1.6)
Requirement already satisfied: scikit-learn>=0.14.0 in /usr/local/lib/python3.6/dist-packages/scikit_learn-0.20.3-py3.6-linux-x86_64.egg (from librosa==0.5.1->-r requirements.txt (line 4)) (0.20.3)
Requirement already satisfied: decorator>=3.0.0 in /usr/local/lib/python3.6/dist-packages/decorator-4.4.0-py3.6.egg (from librosa==0.5.1->-r requirements.txt (line 4)) (4.4.0)
Requirement already satisfied: six>=1.3 in /usr/local/lib/python3.6/dist-packages/six-1.12.0-py3.6.egg (from librosa==0.5.1->-r requirements.txt (line 4)) (1.12.0)
Requirement already satisfied: resampy>=0.1.2 in /usr/local/lib/python3.6/dist-packages/resampy-0.2.1-py3.6.egg (from librosa==0.5.1->-r requirements.txt (line 4)) (0.2.1)
Collecting absl-py>=0.4 (from tensorboard->-r requirements.txt (line 6))
  Downloading https://files.pythonhosted.org/packages/da/3f/9b0355080b81b15ba6a9ffcf1f5ea39e307a2778b2f2dc8694724e8abd5b/absl-py-0.7.1.tar.gz (99kB)
    100% |████████████████████████████████| 102kB 1.3MB/s 
Collecting grpcio>=1.6.3 (from tensorboard->-r requirements.txt (line 6))
  Downloading https://files.pythonhosted.org/packages/f4/dc/5503d89e530988eb7a1aed337dcb456ef8150f7c06132233bd9e41ec0215/grpcio-1.19.0-cp36-cp36m-manylinux1_x86_64.whl (10.8MB)
    100% |████████████████████████████████| 10.8MB 3.1MB/s 
Collecting markdown>=2.6.8 (from tensorboard->-r requirements.txt (line 6))
  Downloading https://files.pythonhosted.org/packages/f5/e4/d8c18f2555add57ff21bf25af36d827145896a07607486cc79a2aea641af/Markdown-3.1-py2.py3-none-any.whl (87kB)
    100% |████████████████████████████████| 92kB 2.3MB/s 
Requirement already satisfied: werkzeug>=0.11.15 in /usr/local/lib/python3.6/dist-packages/Werkzeug-0.15.2-py3.6.egg (from tensorboard->-r requirements.txt (line 6)) (0.15.2)
Requirement already satisfied: wheel>=0.26; python_version >= "3" in /usr/local/lib/python3.6/dist-packages (from tensorboard->-r requirements.txt (line 6)) (0.33.1)
Requirement already satisfied: protobuf>=3.6.0 in /usr/local/lib/python3.6/dist-packages/protobuf-3.7.1-py3.6.egg (from tensorboard->-r requirements.txt (line 6)) (3.7.1)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.6/dist-packages/cycler-0.10.0-py3.6.egg (from matplotlib==2.0.2->-r requirements.txt (line 8)) (0.10.0)
Requirement already satisfied: pyparsing!=2.0.0,!=2.0.4,!=2.1.2,!=2.1.6,>=1.5.6 in /usr/local/lib/python3.6/dist-packages/pyparsing-2.4.0-py3.6.egg (from matplotlib==2.0.2->-r requirements.txt (line 8)) (2.4.0)
Requirement already satisfied: python-dateutil in /usr/local/lib/python3.6/dist-packages/python_dateutil-2.8.0-py3.6.egg (from matplotlib==2.0.2->-r requirements.txt (line 8)) (2.8.0)
Requirement already satisfied: pytz in /usr/local/lib/python3.6/dist-packages/pytz-2019.1-py3.6.egg (from matplotlib==2.0.2->-r requirements.txt (line 8)) (2019.1)
Requirement already satisfied: Jinja2>=2.10 in /usr/local/lib/python3.6/dist-packages/Jinja2-2.10.1-py3.6.egg (from flask->-r requirements.txt (line 10)) (2.10.1)
Requirement already satisfied: click>=5.1 in /usr/local/lib/python3.6/dist-packages/Click-7.0-py3.6.egg (from flask->-r requirements.txt (line 10)) (7.0)
Requirement already satisfied: itsdangerous>=0.24 in /usr/local/lib/python3.6/dist-packages/itsdangerous-1.1.0-py3.6.egg (from flask->-r requirements.txt (line 10)) (1.1.0)
Requirement already satisfied: clldutils>=1.7.3 in /usr/local/lib/python3.6/dist-packages/clldutils-2.7.0-py3.6.egg (from segments->phonemizer==1.0.1->-r requirements.txt (line 14)) (2.7.0)
Requirement already satisfied: csvw in /usr/local/lib/python3.6/dist-packages/csvw-1.4.5-py3.6.egg (from segments->phonemizer==1.0.1->-r requirements.txt (line 14)) (1.4.5)
Requirement already satisfied: regex in /usr/local/lib/python3.6/dist-packages/regex-2019.4.10-py3.6-linux-x86_64.egg (from segments->phonemizer==1.0.1->-r requirements.txt (line 14)) (2019.4.10)
Requirement already satisfied: numba>=0.32 in /usr/local/lib/python3.6/dist-packages/numba-0.43.1-py3.6-linux-x86_64.egg (from resampy>=0.1.2->librosa==0.5.1->-r requirements.txt (line 4)) (0.43.1)
Requirement already satisfied: setuptools>=36 in /usr/local/lib/python3.6/dist-packages (from markdown>=2.6.8->tensorboard->-r requirements.txt (line 6)) (41.0.0)
Requirement already satisfied: MarkupSafe>=0.23 in /usr/local/lib/python3.6/dist-packages/MarkupSafe-1.1.1-py3.6-linux-x86_64.egg (from Jinja2>=2.10->flask->-r requirements.txt (line 10)) (1.1.1)
Requirement already satisfied: colorlog in /usr/local/lib/python3.6/dist-packages/colorlog-4.0.2-py3.6.egg (from clldutils>=1.7.3->segments->phonemizer==1.0.1->-r requirements.txt (line 14)) (4.0.2)
Requirement already satisfied: configparser>=3.5.0 in /usr/local/lib/python3.6/dist-packages/configparser-3.7.4-py3.6.egg (from clldutils>=1.7.3->segments->phonemizer==1.0.1->-r requirements.txt (line 14)) (3.7.4)
Requirement already satisfied: tabulate>=0.7.7 in /usr/local/lib/python3.6/dist-packages/tabulate-0.8.3-py3.6.egg (from clldutils>=1.7.3->segments->phonemizer==1.0.1->-r requirements.txt (line 14)) (0.8.3)
Requirement already satisfied: isodate in /usr/local/lib/python3.6/dist-packages/isodate-0.6.0-py3.6.egg (from csvw->segments->phonemizer==1.0.1->-r requirements.txt (line 14)) (0.6.0)
Requirement already satisfied: rfc3986 in /usr/local/lib/python3.6/dist-packages/rfc3986-1.2.0-py3.6.egg (from csvw->segments->phonemizer==1.0.1->-r requirements.txt (line 14)) (1.2.0)
Requirement already satisfied: uritemplate>=3.0.0 in /usr/local/lib/python3.6/dist-packages/uritemplate-3.0.0-py3.6.egg (from csvw->segments->phonemizer==1.0.1->-r requirements.txt (line 14)) (3.0.0)
Requirement already satisfied: llvmlite>=0.28.0dev0 in /usr/local/lib/python3.6/dist-packages/llvmlite-0.28.0-py3.6-linux-x86_64.egg (from numba>=0.32->resampy>=0.1.2->librosa==0.5.1->-r requirements.txt (line 4)) (0.28.0)
Building wheels for collected packages: lws, librosa, phonemizer, absl-py
  Building wheel for lws (setup.py) ... done
  Stored in directory: /root/.cache/pip/wheels/07/b1/1a/8dd583ce1048da5130e7cfef1b243c9a44be448f7a2fcf32d2
  Building wheel for librosa (setup.py) ... done
  Stored in directory: /root/.cache/pip/wheels/f6/21/55/9c17b30d30ef57e74b50c8824c2bb368d58ddabf9bf8e1fee0
  Building wheel for phonemizer (setup.py) ... done
  Stored in directory: /tmp/pip-ephem-wheel-cache-gmt3moo_/wheels/af/6a/52/4a29b8347b407c694e8b2f3d87e30e46cce0bd9ddf511818f5
  Building wheel for absl-py (setup.py) ... done
  Stored in directory: /root/.cache/pip/wheels/ee/98/38/46cbcc5a93cfea5492d19c38562691ddb23b940176c14f7b48
Successfully built lws librosa phonemizer absl-py
tts 0.0.1+4a5056b has requirement librosa==0.6.2, but you'll have librosa 0.5.1 which is incompatible.
Installing collected packages: numpy, lws, scipy, librosa, absl-py, grpcio, markdown, tensorboard
  Found existing installation: numpy 1.15.4
Cannot remove entries from nonexistent file /srv/app/.eggs/easy-install.pth

root@65840041e280:/srv/app# python3 server/server.py -c server/conf.json
Traceback (most recent call last):
  File "server/server.py", line 3, in <module>
    from synthesizer import Synthesizer
  File "/srv/app/server/synthesizer.py", line 4, in <module>
    import numpy as np
ImportError: No module named 'numpy'
vitaly-zdanevich commented 5 years ago

Ok, I am not experienced with Docker, I rebuild the the image with the model inside and server/conf.json pointing to the correct location. After docker run -it --rm -p 5002:5002 mozilla-tts I got:

 > Loading model ...
 | > model config:  /srv/app/config.json
 | > model file:  /srv/app/checkpoint_272976.pth.tar
 > Setting up Audio Processor...
 | > bits:None
 | > sample_rate:22050
 | > num_mels:80
 | > min_level_db:-100
 | > frame_shift_ms:12.5
 | > frame_length_ms:50
 | > ref_level_db:20
 | > num_freq:1025
 | > power:1.5
 | > preemphasis:0.98
 | > griffin_lim_iters:60
 | > signal_norm:True
 | > symmetric_norm:False
 | > mel_fmin:0
 | > mel_fmax:None
 | > max_norm:1.0
 | > clip_norm:True
 | > do_trim_silence:True
 | > n_fft:2048
 | > hop_length:275
 | > win_length:1102
Traceback (most recent call last):
  File "server/server.py", line 16, in <module>
    config.model_config, config.use_cuda)
  File "/srv/app/server/synthesizer.py", line 39, in load_model
    self.model.load_state_dict(cp['model'])
  File "/usr/local/lib/python3.6/dist-packages/torch-1.0.1.post2-py3.6-linux-x86_64.egg/torch/nn/modules/module.py", line 769, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Tacotron:
    Missing key(s) in state_dict: "encoder.cbhg.cbhg.conv1d_banks.0.conv1d.weight", "encoder.cbhg.cbhg.conv1d_banks.0.bn.weight", "encoder.cbhg.cbhg.conv1d_banks.0.bn.bias", "encoder.cbhg.cbhg.conv1d_banks.0.bn.running_mean", "encoder.cbhg.cbhg.conv1d_banks.0.bn.running_var", "encoder.cbhg.cbhg.conv1d_banks.1.conv1d.weight", "encoder.cbhg.cbhg.conv1d_banks.1.bn.weight", "encoder.cbhg.cbhg.conv1d_banks.1.bn.bias", "encoder.cbhg.cbhg.conv1d_banks.1.bn.running_mean", "encoder.cbhg.cbhg.conv1d_banks.1.bn.running_var", "encoder.cbhg.cbhg.conv1d_banks.2.conv1d.weight", "encoder.cbhg.cbhg.conv1d_banks.2.bn.weight", "encoder.cbhg.cbhg.conv1d_banks.2.bn.bias", "encoder.cbhg.cbhg.conv1d_banks.2.bn.running_mean", "encoder.cbhg.cbhg.conv1d_banks.2.bn.running_var", "encoder.cbhg.cbhg.conv1d_banks.3.conv1d.weight", "encoder.cbhg.cbhg.conv1d_banks.3.bn.weight", "encoder.cbhg.cbhg.conv1d_banks.3.bn.bias", "encoder.cbhg.cbhg.conv1d_banks.3.bn.running_mean", "encoder.cbhg.cbhg.conv1d_banks.3.bn.running_var", "encoder.cbhg.cbhg.conv1d_banks.4.conv1d.weight", "encoder.cbhg.cbhg.conv1d_banks.4.bn.weight", "encoder.cbhg.cbhg.conv1d_banks.4.bn.bias", "encoder.cbhg.cbhg.conv1d_banks.4.bn.running_mean", "encoder.cbhg.cbhg.conv1d_banks.4.bn.running_var", "encoder.cbhg.cbhg.conv1d_banks.5.conv1d.weight", "encoder.cbhg.cbhg.conv1d_banks.5.bn.weight", "encoder.cbhg.cbhg.conv1d_banks.5.bn.bias", "encoder.cbhg.cbhg.conv1d_banks.5.bn.running_mean", "encoder.cbhg.cbhg.conv1d_banks.5.bn.running_var", "encoder.cbhg.cbhg.conv1d_banks.6.conv1d.weight", "encoder.cbhg.cbhg.conv1d_banks.6.bn.weight", "encoder.cbhg.cbhg.conv1d_banks.6.bn.bias", "encoder.cbhg.cbhg.conv1d_banks.6.bn.running_mean", "encoder.cbhg.cbhg.conv1d_banks.6.bn.running_var", "encoder.cbhg.cbhg.conv1d_banks.7.conv1d.weight", "encoder.cbhg.cbhg.conv1d_banks.7.bn.weight", "encoder.cbhg.cbhg.conv1d_banks.7.bn.bias", "encoder.cbhg.cbhg.conv1d_banks.7.bn.running_mean", "encoder.cbhg.cbhg.conv1d_banks.7.bn.running_var", "encoder.cbhg.cbhg.conv1d_banks.8.conv1d.weight", "encoder.cbhg.cbhg.conv1d_banks.8.bn.weight", "encoder.cbhg.cbhg.conv1d_banks.8.bn.bias", "encoder.cbhg.cbhg.conv1d_banks.8.bn.running_mean", "encoder.cbhg.cbhg.conv1d_banks.8.bn.running_var", "encoder.cbhg.cbhg.conv1d_banks.9.conv1d.weight", "encoder.cbhg.cbhg.conv1d_banks.9.bn.weight", "encoder.cbhg.cbhg.conv1d_banks.9.bn.bias", "encoder.cbhg.cbhg.conv1d_banks.9.bn.running_mean", "encoder.cbhg.cbhg.conv1d_banks.9.bn.running_var", "encoder.cbhg.cbhg.conv1d_banks.10.conv1d.weight", "encoder.cbhg.cbhg.conv1d_banks.10.bn.weight", "encoder.cbhg.cbhg.conv1d_banks.10.bn.bias", "encoder.cbhg.cbhg.conv1d_banks.10.bn.running_mean", "encoder.cbhg.cbhg.conv1d_banks.10.bn.running_var", "encoder.cbhg.cbhg.conv1d_banks.11.conv1d.weight", "encoder.cbhg.cbhg.conv1d_banks.11.bn.weight", "encoder.cbhg.cbhg.conv1d_banks.11.bn.bias", "encoder.cbhg.cbhg.conv1d_banks.11.bn.running_mean", "encoder.cbhg.cbhg.conv1d_banks.11.bn.running_var", "encoder.cbhg.cbhg.conv1d_banks.12.conv1d.weight", "encoder.cbhg.cbhg.conv1d_banks.12.bn.weight", "encoder.cbhg.cbhg.conv1d_banks.12.bn.bias", "encoder.cbhg.cbhg.conv1d_banks.12.bn.running_mean", "encoder.cbhg.cbhg.conv1d_banks.12.bn.running_var", "encoder.cbhg.cbhg.conv1d_banks.13.conv1d.weight", "encoder.cbhg.cbhg.conv1d_banks.13.bn.weight", "encoder.cbhg.cbhg.conv1d_banks.13.bn.bias", "encoder.cbhg.cbhg.conv1d_banks.13.bn.running_mean", "encoder.cbhg.cbhg.conv1d_banks.13.bn.running_var", "encoder.cbhg.cbhg.conv1d_banks.14.conv1d.weight", "encoder.cbhg.cbhg.conv1d_banks.14.bn.weight", "encoder.cbhg.cbhg.conv1d_banks.14.bn.bias", "encoder.cbhg.cbhg.conv1d_banks.14.bn.running_mean", "encoder.cbhg.cbhg.conv1d_banks.14.bn.running_var", "encoder.cbhg.cbhg.conv1d_banks.15.conv1d.weight", "encoder.cbhg.cbhg.conv1d_banks.15.bn.weight", "encoder.cbhg.cbhg.conv1d_banks.15.bn.bias", "encoder.cbhg.cbhg.conv1d_banks.15.bn.running_mean", "encoder.cbhg.cbhg.conv1d_banks.15.bn.running_var", "encoder.cbhg.cbhg.conv1d_projections.0.conv1d.weight", "encoder.cbhg.cbhg.conv1d_projections.0.bn.weight", "encoder.cbhg.cbhg.conv1d_projections.0.bn.bias", "encoder.cbhg.cbhg.conv1d_projections.0.bn.running_mean", "encoder.cbhg.cbhg.conv1d_projections.0.bn.running_var", "encoder.cbhg.cbhg.conv1d_projections.1.conv1d.weight", "encoder.cbhg.cbhg.conv1d_projections.1.bn.weight", "encoder.cbhg.cbhg.conv1d_projections.1.bn.bias", "encoder.cbhg.cbhg.conv1d_projections.1.bn.running_mean", "encoder.cbhg.cbhg.conv1d_projections.1.bn.running_var", "encoder.cbhg.cbhg.highways.0.H.weight", "encoder.cbhg.cbhg.highways.0.H.bias", "encoder.cbhg.cbhg.highways.0.T.weight", "encoder.cbhg.cbhg.highways.0.T.bias", "encoder.cbhg.cbhg.highways.1.H.weight", "encoder.cbhg.cbhg.highways.1.H.bias", "encoder.cbhg.cbhg.highways.1.T.weight", "encoder.cbhg.cbhg.highways.1.T.bias", "encoder.cbhg.cbhg.highways.2.H.weight", "encoder.cbhg.cbhg.highways.2.H.bias", "encoder.cbhg.cbhg.highways.2.T.weight", "encoder.cbhg.cbhg.highways.2.T.bias", "encoder.cbhg.cbhg.highways.3.H.weight", "encoder.cbhg.cbhg.highways.3.H.bias", "encoder.cbhg.cbhg.highways.3.T.weight", "encoder.cbhg.cbhg.highways.3.T.bias", "encoder.cbhg.cbhg.gru.weight_ih_l0", "encoder.cbhg.cbhg.gru.weight_hh_l0", "encoder.cbhg.cbhg.gru.bias_ih_l0", "encoder.cbhg.cbhg.gru.bias_hh_l0", "encoder.cbhg.cbhg.gru.weight_ih_l0_reverse", "encoder.cbhg.cbhg.gru.weight_hh_l0_reverse", "encoder.cbhg.cbhg.gru.bias_ih_l0_reverse", "encoder.cbhg.cbhg.gru.bias_hh_l0_reverse", "decoder.attention_rnn.alignment_model.loc_conv.1.weight", "decoder.attention_rnn.alignment_model.loc_linear.weight", "decoder.attention_rnn.alignment_model.loc_linear.bias", "decoder.attention_rnn_init.weight", "decoder.memory_init.weight", "decoder.decoder_rnn_inits.weight", "postnet.cbhg.conv1d_banks.0.conv1d.weight", "postnet.cbhg.conv1d_banks.0.bn.weight", "postnet.cbhg.conv1d_banks.0.bn.bias", "postnet.cbhg.conv1d_banks.0.bn.running_mean", "postnet.cbhg.conv1d_banks.0.bn.running_var", "postnet.cbhg.conv1d_banks.1.conv1d.weight", "postnet.cbhg.conv1d_banks.1.bn.weight", "postnet.cbhg.conv1d_banks.1.bn.bias", "postnet.cbhg.conv1d_banks.1.bn.running_mean", "postnet.cbhg.conv1d_banks.1.bn.running_var", "postnet.cbhg.conv1d_banks.2.conv1d.weight", "postnet.cbhg.conv1d_banks.2.bn.weight", "postnet.cbhg.conv1d_banks.2.bn.bias", "postnet.cbhg.conv1d_banks.2.bn.running_mean", "postnet.cbhg.conv1d_banks.2.bn.running_var", "postnet.cbhg.conv1d_banks.3.conv1d.weight", "postnet.cbhg.conv1d_banks.3.bn.weight", "postnet.cbhg.conv1d_banks.3.bn.bias", "postnet.cbhg.conv1d_banks.3.bn.running_mean", "postnet.cbhg.conv1d_banks.3.bn.running_var", "postnet.cbhg.conv1d_banks.4.conv1d.weight", "postnet.cbhg.conv1d_banks.4.bn.weight", "postnet.cbhg.conv1d_banks.4.bn.bias", "postnet.cbhg.conv1d_banks.4.bn.running_mean", "postnet.cbhg.conv1d_banks.4.bn.running_var", "postnet.cbhg.conv1d_banks.5.conv1d.weight", "postnet.cbhg.conv1d_banks.5.bn.weight", "postnet.cbhg.conv1d_banks.5.bn.bias", "postnet.cbhg.conv1d_banks.5.bn.running_mean", "postnet.cbhg.conv1d_banks.5.bn.running_var", "postnet.cbhg.conv1d_banks.6.conv1d.weight", "postnet.cbhg.conv1d_banks.6.bn.weight", "postnet.cbhg.conv1d_banks.6.bn.bias", "postnet.cbhg.conv1d_banks.6.bn.running_mean", "postnet.cbhg.conv1d_banks.6.bn.running_var", "postnet.cbhg.conv1d_banks.7.conv1d.weight", "postnet.cbhg.conv1d_banks.7.bn.weight", "postnet.cbhg.conv1d_banks.7.bn.bias", "postnet.cbhg.conv1d_banks.7.bn.running_mean", "postnet.cbhg.conv1d_banks.7.bn.running_var", "postnet.cbhg.conv1d_projections.0.conv1d.weight", "postnet.cbhg.conv1d_projections.0.bn.weight", "postnet.cbhg.conv1d_projections.0.bn.bias", "postnet.cbhg.conv1d_projections.0.bn.running_mean", "postnet.cbhg.conv1d_projections.0.bn.running_var", "postnet.cbhg.conv1d_projections.1.conv1d.weight", "postnet.cbhg.conv1d_projections.1.bn.weight", "postnet.cbhg.conv1d_projections.1.bn.bias", "postnet.cbhg.conv1d_projections.1.bn.running_mean", "postnet.cbhg.conv1d_projections.1.bn.running_var", "postnet.cbhg.pre_highway.weight", "postnet.cbhg.highways.0.H.weight", "postnet.cbhg.highways.0.H.bias", "postnet.cbhg.highways.0.T.weight", "postnet.cbhg.highways.0.T.bias", "postnet.cbhg.highways.1.H.weight", "postnet.cbhg.highways.1.H.bias", "postnet.cbhg.highways.1.T.weight", "postnet.cbhg.highways.1.T.bias", "postnet.cbhg.highways.2.H.weight", "postnet.cbhg.highways.2.H.bias", "postnet.cbhg.highways.2.T.weight", "postnet.cbhg.highways.2.T.bias", "postnet.cbhg.highways.3.H.weight", "postnet.cbhg.highways.3.H.bias", "postnet.cbhg.highways.3.T.weight", "postnet.cbhg.highways.3.T.bias", "postnet.cbhg.gru.weight_ih_l0", "postnet.cbhg.gru.weight_hh_l0", "postnet.cbhg.gru.bias_ih_l0", "postnet.cbhg.gru.bias_hh_l0", "postnet.cbhg.gru.weight_ih_l0_reverse", "postnet.cbhg.gru.weight_hh_l0_reverse", "postnet.cbhg.gru.bias_ih_l0_reverse", "postnet.cbhg.gru.bias_hh_l0_reverse", "last_linear.0.weight", "last_linear.0.bias". 
    Unexpected key(s) in state_dict: "encoder.cbhg.conv1d_banks.0.conv1d.weight", "encoder.cbhg.conv1d_banks.0.bn.weight", "encoder.cbhg.conv1d_banks.0.bn.bias", "encoder.cbhg.conv1d_banks.0.bn.running_mean", "encoder.cbhg.conv1d_banks.0.bn.running_var", "encoder.cbhg.conv1d_banks.1.conv1d.weight", "encoder.cbhg.conv1d_banks.1.bn.weight", "encoder.cbhg.conv1d_banks.1.bn.bias", "encoder.cbhg.conv1d_banks.1.bn.running_mean", "encoder.cbhg.conv1d_banks.1.bn.running_var", "encoder.cbhg.conv1d_banks.2.conv1d.weight", "encoder.cbhg.conv1d_banks.2.bn.weight", "encoder.cbhg.conv1d_banks.2.bn.bias", "encoder.cbhg.conv1d_banks.2.bn.running_mean", "encoder.cbhg.conv1d_banks.2.bn.running_var", "encoder.cbhg.conv1d_banks.3.conv1d.weight", "encoder.cbhg.conv1d_banks.3.bn.weight", "encoder.cbhg.conv1d_banks.3.bn.bias", "encoder.cbhg.conv1d_banks.3.bn.running_mean", "encoder.cbhg.conv1d_banks.3.bn.running_var", "encoder.cbhg.conv1d_banks.4.conv1d.weight", "encoder.cbhg.conv1d_banks.4.bn.weight", "encoder.cbhg.conv1d_banks.4.bn.bias", "encoder.cbhg.conv1d_banks.4.bn.running_mean", "encoder.cbhg.conv1d_banks.4.bn.running_var", "encoder.cbhg.conv1d_banks.5.conv1d.weight", "encoder.cbhg.conv1d_banks.5.bn.weight", "encoder.cbhg.conv1d_banks.5.bn.bias", "encoder.cbhg.conv1d_banks.5.bn.running_mean", "encoder.cbhg.conv1d_banks.5.bn.running_var", "encoder.cbhg.conv1d_banks.6.conv1d.weight", "encoder.cbhg.conv1d_banks.6.bn.weight", "encoder.cbhg.conv1d_banks.6.bn.bias", "encoder.cbhg.conv1d_banks.6.bn.running_mean", "encoder.cbhg.conv1d_banks.6.bn.running_var", "encoder.cbhg.conv1d_banks.7.conv1d.weight", "encoder.cbhg.conv1d_banks.7.bn.weight", "encoder.cbhg.conv1d_banks.7.bn.bias", "encoder.cbhg.conv1d_banks.7.bn.running_mean", "encoder.cbhg.conv1d_banks.7.bn.running_var", "encoder.cbhg.conv1d_banks.8.conv1d.weight", "encoder.cbhg.conv1d_banks.8.bn.weight", "encoder.cbhg.conv1d_banks.8.bn.bias", "encoder.cbhg.conv1d_banks.8.bn.running_mean", "encoder.cbhg.conv1d_banks.8.bn.running_var", "encoder.cbhg.conv1d_banks.9.conv1d.weight", "encoder.cbhg.conv1d_banks.9.bn.weight", "encoder.cbhg.conv1d_banks.9.bn.bias", "encoder.cbhg.conv1d_banks.9.bn.running_mean", "encoder.cbhg.conv1d_banks.9.bn.running_var", "encoder.cbhg.conv1d_banks.10.conv1d.weight", "encoder.cbhg.conv1d_banks.10.bn.weight", "encoder.cbhg.conv1d_banks.10.bn.bias", "encoder.cbhg.conv1d_banks.10.bn.running_mean", "encoder.cbhg.conv1d_banks.10.bn.running_var", "encoder.cbhg.conv1d_banks.11.conv1d.weight", "encoder.cbhg.conv1d_banks.11.bn.weight", "encoder.cbhg.conv1d_banks.11.bn.bias", "encoder.cbhg.conv1d_banks.11.bn.running_mean", "encoder.cbhg.conv1d_banks.11.bn.running_var", "encoder.cbhg.conv1d_banks.12.conv1d.weight", "encoder.cbhg.conv1d_banks.12.bn.weight", "encoder.cbhg.conv1d_banks.12.bn.bias", "encoder.cbhg.conv1d_banks.12.bn.running_mean", "encoder.cbhg.conv1d_banks.12.bn.running_var", "encoder.cbhg.conv1d_banks.13.conv1d.weight", "encoder.cbhg.conv1d_banks.13.bn.weight", "encoder.cbhg.conv1d_banks.13.bn.bias", "encoder.cbhg.conv1d_banks.13.bn.running_mean", "encoder.cbhg.conv1d_banks.13.bn.running_var", "encoder.cbhg.conv1d_banks.14.conv1d.weight", "encoder.cbhg.conv1d_banks.14.bn.weight", "encoder.cbhg.conv1d_banks.14.bn.bias", "encoder.cbhg.conv1d_banks.14.bn.running_mean", "encoder.cbhg.conv1d_banks.14.bn.running_var", "encoder.cbhg.conv1d_banks.15.conv1d.weight", "encoder.cbhg.conv1d_banks.15.bn.weight", "encoder.cbhg.conv1d_banks.15.bn.bias", "encoder.cbhg.conv1d_banks.15.bn.running_mean", "encoder.cbhg.conv1d_banks.15.bn.running_var", "encoder.cbhg.conv1d_projections.0.conv1d.weight", "encoder.cbhg.conv1d_projections.0.bn.weight", "encoder.cbhg.conv1d_projections.0.bn.bias", "encoder.cbhg.conv1d_projections.0.bn.running_mean", "encoder.cbhg.conv1d_projections.0.bn.running_var", "encoder.cbhg.conv1d_projections.1.conv1d.weight", "encoder.cbhg.conv1d_projections.1.bn.weight", "encoder.cbhg.conv1d_projections.1.bn.bias", "encoder.cbhg.conv1d_projections.1.bn.running_mean", "encoder.cbhg.conv1d_projections.1.bn.running_var", "encoder.cbhg.pre_highway.weight", "encoder.cbhg.highways.0.H.weight", "encoder.cbhg.highways.0.H.bias", "encoder.cbhg.highways.0.T.weight", "encoder.cbhg.highways.0.T.bias", "encoder.cbhg.highways.1.H.weight", "encoder.cbhg.highways.1.H.bias", "encoder.cbhg.highways.1.T.weight", "encoder.cbhg.highways.1.T.bias", "encoder.cbhg.highways.2.H.weight", "encoder.cbhg.highways.2.H.bias", "encoder.cbhg.highways.2.T.weight", "encoder.cbhg.highways.2.T.bias", "encoder.cbhg.highways.3.H.weight", "encoder.cbhg.highways.3.H.bias", "encoder.cbhg.highways.3.T.weight", "encoder.cbhg.highways.3.T.bias", "encoder.cbhg.gru.weight_ih_l0", "encoder.cbhg.gru.weight_hh_l0", "encoder.cbhg.gru.bias_ih_l0", "encoder.cbhg.gru.bias_hh_l0", "encoder.cbhg.gru.weight_ih_l0_reverse", "encoder.cbhg.gru.weight_hh_l0_reverse", "encoder.cbhg.gru.bias_ih_l0_reverse", "encoder.cbhg.gru.bias_hh_l0_reverse", "decoder.stopnet.rnn.weight_ih", "decoder.stopnet.rnn.weight_hh", "decoder.stopnet.rnn.bias_ih", "decoder.stopnet.rnn.bias_hh", "postnet.conv1d_banks.0.conv1d.weight", "postnet.conv1d_banks.0.bn.weight", "postnet.conv1d_banks.0.bn.bias", "postnet.conv1d_banks.0.bn.running_mean", "postnet.conv1d_banks.0.bn.running_var", "postnet.conv1d_banks.1.conv1d.weight", "postnet.conv1d_banks.1.bn.weight", "postnet.conv1d_banks.1.bn.bias", "postnet.conv1d_banks.1.bn.running_mean", "postnet.conv1d_banks.1.bn.running_var", "postnet.conv1d_banks.2.conv1d.weight", "postnet.conv1d_banks.2.bn.weight", "postnet.conv1d_banks.2.bn.bias", "postnet.conv1d_banks.2.bn.running_mean", "postnet.conv1d_banks.2.bn.running_var", "postnet.conv1d_banks.3.conv1d.weight", "postnet.conv1d_banks.3.bn.weight", "postnet.conv1d_banks.3.bn.bias", "postnet.conv1d_banks.3.bn.running_mean", "postnet.conv1d_banks.3.bn.running_var", "postnet.conv1d_banks.4.conv1d.weight", "postnet.conv1d_banks.4.bn.weight", "postnet.conv1d_banks.4.bn.bias", "postnet.conv1d_banks.4.bn.running_mean", "postnet.conv1d_banks.4.bn.running_var", "postnet.conv1d_banks.5.conv1d.weight", "postnet.conv1d_banks.5.bn.weight", "postnet.conv1d_banks.5.bn.bias", "postnet.conv1d_banks.5.bn.running_mean", "postnet.conv1d_banks.5.bn.running_var", "postnet.conv1d_banks.6.conv1d.weight", "postnet.conv1d_banks.6.bn.weight", "postnet.conv1d_banks.6.bn.bias", "postnet.conv1d_banks.6.bn.running_mean", "postnet.conv1d_banks.6.bn.running_var", "postnet.conv1d_banks.7.conv1d.weight", "postnet.conv1d_banks.7.bn.weight", "postnet.conv1d_banks.7.bn.bias", "postnet.conv1d_banks.7.bn.running_mean", "postnet.conv1d_banks.7.bn.running_var", "postnet.conv1d_projections.0.conv1d.weight", "postnet.conv1d_projections.0.bn.weight", "postnet.conv1d_projections.0.bn.bias", "postnet.conv1d_projections.0.bn.running_mean", "postnet.conv1d_projections.0.bn.running_var", "postnet.conv1d_projections.1.conv1d.weight", "postnet.conv1d_projections.1.bn.weight", "postnet.conv1d_projections.1.bn.bias", "postnet.conv1d_projections.1.bn.running_mean", "postnet.conv1d_projections.1.bn.running_var", "postnet.pre_highway.weight", "postnet.highways.0.H.weight", "postnet.highways.0.H.bias", "postnet.highways.0.T.weight", "postnet.highways.0.T.bias", "postnet.highways.1.H.weight", "postnet.highways.1.H.bias", "postnet.highways.1.T.weight", "postnet.highways.1.T.bias", "postnet.highways.2.H.weight", "postnet.highways.2.H.bias", "postnet.highways.2.T.weight", "postnet.highways.2.T.bias", "postnet.highways.3.H.weight", "postnet.highways.3.H.bias", "postnet.highways.3.T.weight", "postnet.highways.3.T.bias", "postnet.gru.weight_ih_l0", "postnet.gru.weight_hh_l0", "postnet.gru.bias_ih_l0", "postnet.gru.bias_hh_l0", "postnet.gru.weight_ih_l0_reverse", "postnet.gru.weight_hh_l0_reverse", "postnet.gru.bias_ih_l0_reverse", "postnet.gru.bias_hh_l0_reverse", "last_linear.weight", "last_linear.bias". 
    size mismatch for embedding.weight: copying a param with shape torch.Size([149, 256]) from checkpoint, the shape in current model is torch.Size([129, 256]).
    size mismatch for decoder.attention_rnn.alignment_model.query_layer.weight: copying a param with shape torch.Size([256, 256]) from checkpoint, the shape in current model is torch.Size([128, 256]).
    size mismatch for decoder.attention_rnn.alignment_model.query_layer.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for decoder.attention_rnn.alignment_model.annot_layer.weight: copying a param with shape torch.Size([256, 256]) from checkpoint, the shape in current model is torch.Size([128, 256]).
    size mismatch for decoder.attention_rnn.alignment_model.annot_layer.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for decoder.attention_rnn.alignment_model.v.weight: copying a param with shape torch.Size([1, 256]) from checkpoint, the shape in current model is torch.Size([1, 128]).
    size mismatch for decoder.stopnet.linear.weight: copying a param with shape torch.Size([1, 400]) from checkpoint, the shape in current model is torch.Size([1, 656]).
vitaly-zdanevich commented 5 years ago

Tried on branch dev-tacotron2

$ docker run -it --rm -p 5002:5002 mozilla-tts
 > Loading model ...
 | > model config:  /srv/app/config.json
 | > model file:  /srv/app/checkpoint_272976.pth.tar
 > Setting up Audio Processor...
 | > bits:None
 | > sample_rate:22050
 | > num_mels:80
 | > min_level_db:-100
 | > frame_shift_ms:12.5
 | > frame_length_ms:50
 | > ref_level_db:20
 | > num_freq:1025
 | > power:1.5
 | > preemphasis:0.98
 | > griffin_lim_iters:60
 | > signal_norm:True
 | > symmetric_norm:False
 | > mel_fmin:0.0
 | > mel_fmax:8000.0
 | > max_norm:1.0
 | > clip_norm:True
 | > do_trim_silence:True
 | > n_fft:2048
 | > hop_length:275
 | > win_length:1102
Traceback (most recent call last):
  File "server/server.py", line 16, in <module>
    config.model_config, config.use_cuda)
  File "/srv/app/server/synthesizer.py", line 26, in load_model
    self.model = Tacotron(config.embedding_size, self.ap.num_freq, self.ap.num_mels, config.r)
AttributeError: 'AttrDict' object has no attribute 'embedding_size'
erogol commented 5 years ago

The first part is your question is not really TTS problem but @tomzx might help you who wrote the docker file.

The second part is that the model checkpoint does not match with the code you use to load it. You need to use the right commit version of TTS to run the corresponding model which are given in the checkpoint table.

vitaly-zdanevich commented 5 years ago

Ok, thank you, I tried commit db7f3d3 with corresponding model and config from the table - downloaded from Google Drive, put to the root of the cloned repository, docker build -t mozilla-tts ., docker run -it --rm -p 5002:5002 mozilla-tts and got:

 > Loading model ...
 | > model config:  /srv/app/config.json
 | > model file:  /srv/app/best_model.pth.tar
 > Setting up Audio Processor...
 | > fft size: 2048, hop length: 275, win length: 1102
 | > Audio Processor attributes.
   | > bits:None
   | > sample_rate:22050
   | > num_mels:80
   | > min_level_db:-100
   | > frame_shift_ms:12.5
   | > frame_length_ms:50
   | > ref_level_db:20
   | > num_freq:1025
   | > power:1.5
   | > preemphasis:0.98
   | > griffin_lim_iters:60
   | > signal_norm:True
   | > symmetric_norm:False
   | > mel_fmin:0
   | > mel_fmax:None
   | > max_norm:1.0
   | > clip_norm:True
   | > do_trim_silence:True
   | > n_fft:2048
   | > hop_length:275
   | > win_length:1102
 | > Number of characters : 256
Traceback (most recent call last):
  File "server/server.py", line 16, in <module>
    config.model_config, config.use_cuda)
  File "/srv/app/server/synthesizer.py", line 34, in load_model
    self.model.load_state_dict(cp['model'])
  File "/usr/local/lib/python3.6/dist-packages/torch-1.0.1.post2-py3.6-linux-x86_64.egg/torch/nn/modules/module.py", line 769, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Tacotron:
    size mismatch for embedding.weight: copying a param with shape torch.Size([61, 256]) from checkpoint, the shape in current model is torch.Size([256, 1025]).
    size mismatch for encoder.prenet.layers.0.weight: copying a param with shape torch.Size([256, 256]) from checkpoint, the shape in current model is torch.Size([256, 1025]).
    size mismatch for decoder.prenet.layers.0.weight: copying a param with shape torch.Size([256, 400]) from checkpoint, the shape in current model is torch.Size([256, 25]).
    size mismatch for decoder.proj_to_mel.weight: copying a param with shape torch.Size([160, 256]) from checkpoint, the shape in current model is torch.Size([25, 256]).
    size mismatch for decoder.proj_to_mel.bias: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([25]).
    size mismatch for decoder.memory_init.weight: copying a param with shape torch.Size([1, 400]) from checkpoint, the shape in current model is torch.Size([1, 25]).
    size mismatch for decoder.stopnet.linear.weight: copying a param with shape torch.Size([1, 416]) from checkpoint, the shape in current model is torch.Size([1, 281]).
    size mismatch for postnet.cbhg.conv1d_banks.0.conv1d.weight: copying a param with shape torch.Size([128, 80, 1]) from checkpoint, the shape in current model is torch.Size([128, 5, 1]).
    size mismatch for postnet.cbhg.conv1d_banks.1.conv1d.weight: copying a param with shape torch.Size([128, 80, 2]) from checkpoint, the shape in current model is torch.Size([128, 5, 2]).
    size mismatch for postnet.cbhg.conv1d_banks.2.conv1d.weight: copying a param with shape torch.Size([128, 80, 3]) from checkpoint, the shape in current model is torch.Size([128, 5, 3]).
    size mismatch for postnet.cbhg.conv1d_banks.3.conv1d.weight: copying a param with shape torch.Size([128, 80, 4]) from checkpoint, the shape in current model is torch.Size([128, 5, 4]).
    size mismatch for postnet.cbhg.conv1d_banks.4.conv1d.weight: copying a param with shape torch.Size([128, 80, 5]) from checkpoint, the shape in current model is torch.Size([128, 5, 5]).
    size mismatch for postnet.cbhg.conv1d_banks.5.conv1d.weight: copying a param with shape torch.Size([128, 80, 6]) from checkpoint, the shape in current model is torch.Size([128, 5, 6]).
    size mismatch for postnet.cbhg.conv1d_banks.6.conv1d.weight: copying a param with shape torch.Size([128, 80, 7]) from checkpoint, the shape in current model is torch.Size([128, 5, 7]).
    size mismatch for postnet.cbhg.conv1d_banks.7.conv1d.weight: copying a param with shape torch.Size([128, 80, 8]) from checkpoint, the shape in current model is torch.Size([128, 5, 8]).
    size mismatch for postnet.cbhg.conv1d_projections.1.conv1d.weight: copying a param with shape torch.Size([80, 256, 3]) from checkpoint, the shape in current model is torch.Size([5, 256, 3]).
    size mismatch for postnet.cbhg.conv1d_projections.1.bn.weight: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([5]).
    size mismatch for postnet.cbhg.conv1d_projections.1.bn.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([5]).
    size mismatch for postnet.cbhg.conv1d_projections.1.bn.running_mean: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([5]).
    size mismatch for postnet.cbhg.conv1d_projections.1.bn.running_var: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([5]).
    size mismatch for postnet.cbhg.pre_highway.weight: copying a param with shape torch.Size([128, 80]) from checkpoint, the shape in current model is torch.Size([128, 5]).
    size mismatch for last_linear.0.weight: copying a param with shape torch.Size([1025, 256]) from checkpoint, the shape in current model is torch.Size([80, 256]).
    size mismatch for last_linear.0.bias: copying a param with shape torch.Size([1025]) from checkpoint, the shape in current model is torch.Size([80]).
vitaly-zdanevich commented 5 years ago

I tried to try commit 2810d57 from the same table but got error: pathspec '2810d57' did not match any file(s) known to git

vitaly-zdanevich commented 5 years ago

Hm, also I tried db7f3d3 without Docker - on a fresh Ubuntu 18.10, also the same model and config from table (from the same row):

$ python3 server/server.py -c server/conf.json
/home/vitaly_zdanevich/.local/lib/python3.6/site-packages/numba/errors.py:105: UserWarning: Insufficiently recent colorama version found. Numba requires colorama >= 0.3.9
  warnings.warn(msg)
 > Loading model ...
 | > model config:  /home/vitaly_zdanevich/TTS/config.json
 | > model file:  /home/vitaly_zdanevich/TTS/best_model.pth.tar
 > Setting up Audio Processor...
 | > fft size: 2048, hop length: 275, win length: 1102
 | > Audio Processor attributes.
   | > bits:None
   | > sample_rate:22050
   | > num_mels:80
   | > min_level_db:-100
   | > frame_shift_ms:12.5
   | > frame_length_ms:50
   | > ref_level_db:20
   | > num_freq:1025
   | > power:1.5
   | > preemphasis:0.98
   | > griffin_lim_iters:60
   | > signal_norm:True
   | > symmetric_norm:False
   | > mel_fmin:0
   | > mel_fmax:None
   | > max_norm:1.0
   | > clip_norm:True
   | > do_trim_silence:True
   | > n_fft:2048
   | > hop_length:275
   | > win_length:1102
 | > Number of characters : 256
Traceback (most recent call last):
  File "server/server.py", line 16, in <module>
    config.model_config, config.use_cuda)
  File "/home/vitaly_zdanevich/TTS/server/synthesizer.py", line 34, in load_model
    self.model.load_state_dict(cp['model'])
  File "/home/vitaly_zdanevich/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 769, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Tacotron:
        size mismatch for embedding.weight: copying a param with shape torch.Size([61, 256]) from checkpoint, the shape in current model is torch.Size([256, 1025]).
        size mismatch for encoder.prenet.layers.0.weight: copying a param with shape torch.Size([256, 256]) from checkpoint, the shape in current model is torch.Size([256, 1025]).
        size mismatch for decoder.prenet.layers.0.weight: copying a param with shape torch.Size([256, 400]) from checkpoint, the shape in current model is torch.Size([256, 10]).
        size mismatch for decoder.proj_to_mel.weight: copying a param with shape torch.Size([160, 256]) from checkpoint, the shape in current model is torch.Size([10, 256]).
        size mismatch for decoder.proj_to_mel.bias: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([10]).
        size mismatch for decoder.memory_init.weight: copying a param with shape torch.Size([1, 400]) from checkpoint, the shape in current model is torch.Size([1, 10]).
        size mismatch for decoder.stopnet.linear.weight: copying a param with shape torch.Size([1, 416]) from checkpoint, the shape in current model is torch.Size([1, 266]).
        size mismatch for postnet.cbhg.conv1d_banks.0.conv1d.weight: copying a param with shape torch.Size([128, 80, 1]) from checkpoint, the shape in current model is torch.Size([128, 2, 1]).
        size mismatch for postnet.cbhg.conv1d_banks.1.conv1d.weight: copying a param with shape torch.Size([128, 80, 2]) from checkpoint, the shape in current model is torch.Size([128, 2, 2]).
        size mismatch for postnet.cbhg.conv1d_banks.2.conv1d.weight: copying a param with shape torch.Size([128, 80, 3]) from checkpoint, the shape in current model is torch.Size([128, 2, 3]).
        size mismatch for postnet.cbhg.conv1d_banks.3.conv1d.weight: copying a param with shape torch.Size([128, 80, 4]) from checkpoint, the shape in current model is torch.Size([128, 2, 4]).
        size mismatch for postnet.cbhg.conv1d_banks.4.conv1d.weight: copying a param with shape torch.Size([128, 80, 5]) from checkpoint, the shape in current model is torch.Size([128, 2, 5]).
        size mismatch for postnet.cbhg.conv1d_banks.5.conv1d.weight: copying a param with shape torch.Size([128, 80, 6]) from checkpoint, the shape in current model is torch.Size([128, 2, 6]).
        size mismatch for postnet.cbhg.conv1d_banks.6.conv1d.weight: copying a param with shape torch.Size([128, 80, 7]) from checkpoint, the shape in current model is torch.Size([128, 2, 7]).
        size mismatch for postnet.cbhg.conv1d_banks.7.conv1d.weight: copying a param with shape torch.Size([128, 80, 8]) from checkpoint, the shape in current model is torch.Size([128, 2, 8]).
        size mismatch for postnet.cbhg.conv1d_projections.1.conv1d.weight: copying a param with shape torch.Size([80, 256, 3]) from checkpoint, the shape in current model is torch.Size([2, 256, 3]).
        size mismatch for postnet.cbhg.conv1d_projections.1.bn.weight: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([2]).
        size mismatch for postnet.cbhg.conv1d_projections.1.bn.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([2]).
        size mismatch for postnet.cbhg.conv1d_projections.1.bn.running_mean: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([2]).
        size mismatch for postnet.cbhg.conv1d_projections.1.bn.running_var: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([2]).
        size mismatch for postnet.cbhg.pre_highway.weight: copying a param with shape torch.Size([128, 80]) from checkpoint, the shape in current model is torch.Size([128, 2]).
        size mismatch for last_linear.0.weight: copying a param with shape torch.Size([1025, 256]) from checkpoint, the shape in current model is torch.Size([80, 256]).
        size mismatch for last_linear.0.bias: copying a param with shape torch.Size([1025]) from checkpoint, the shape in current model is torch.Size([80]).
nmstoker commented 5 years ago

I'm also trying to use the server directly (ie not via Docker), with the downloaded model based on the table with the repo at db7f3d3 and I get the same issue (see output below).

However I do manage to get the Benchmark.ipynb notebook working successfully and despite then using the same (working) config.json file with server.py there is still a problem, so I suspect it may be some kind of issue where the commit had the server folder in a state where it wouldn't work with that model and perhaps this just wasn't noticed?

Will see if I can figure out which settings may be causing it.

python server/server.py -c server/conf.json 
/home/neil/.conda/envs/tts/lib/python3.6/site-packages/scikit_learn-0.20.0-py3.6-linux-x86_64.egg/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
 > Loading model ...
 | > model config:  /home/neil/main/Projects/TTS-models/queue-February-16-2019_03+16AM-90f0cd6/config.json
 | > model file:  /home/neil/main/Projects/TTS-models/queue-February-16-2019_03+16AM-90f0cd6/best_model.pth.tar
 > Setting up Audio Processor...
 | > fft size: 2048, hop length: 275, win length: 1102
 | > Audio Processor attributes.
   | > bits:None
   | > sample_rate:22050
   | > num_mels:80
   | > min_level_db:-100
   | > frame_shift_ms:12.5
   | > frame_length_ms:50
   | > ref_level_db:20
   | > num_freq:1025
   | > power:1.5
   | > preemphasis:0.98
   | > griffin_lim_iters:60
   | > signal_norm:True
   | > symmetric_norm:False
   | > mel_fmin:0
   | > mel_fmax:None
   | > max_norm:1.0
   | > clip_norm:True
   | > do_trim_silence:True
   | > n_fft:2048
   | > hop_length:275
   | > win_length:1102
 | > Number of characters : 256
Traceback (most recent call last):
  File "server/server.py", line 16, in <module>
    config.model_config, config.use_cuda)
  File "/home/neil/main/Projects/TTS/server/synthesizer.py", line 34, in load_model
    self.model.load_state_dict(cp['model'])
  File "/home/neil/.conda/envs/tts/lib/python3.6/site-packages/torch/nn/modules/module.py", line 769, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Tacotron:
        size mismatch for embedding.weight: copying a param with shape torch.Size([61, 256]) from checkpoint, the shape in current model is torch.Size([256, 1025]).
        size mismatch for encoder.prenet.layers.0.weight: copying a param with shape torch.Size([256, 256]) from checkpoint, the shape in current model is torch.Size([256, 1025]).
        size mismatch for decoder.prenet.layers.0.weight: copying a param with shape torch.Size([256, 400]) from checkpoint, the shape in current model is torch.Size([256, 10]).
        size mismatch for decoder.proj_to_mel.weight: copying a param with shape torch.Size([160, 256]) from checkpoint, the shape in current model is torch.Size([10, 256]).
        size mismatch for decoder.proj_to_mel.bias: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([10]).
        size mismatch for decoder.memory_init.weight: copying a param with shape torch.Size([1, 400]) from checkpoint, the shape in current model is torch.Size([1, 10]).
        size mismatch for decoder.stopnet.linear.weight: copying a param with shape torch.Size([1, 416]) from checkpoint, the shape in current model is torch.Size([1, 266]).
        size mismatch for postnet.cbhg.conv1d_banks.0.conv1d.weight: copying a param with shape torch.Size([128, 80, 1]) from checkpoint, the shape in current model is torch.Size([128, 2, 1]).
        size mismatch for postnet.cbhg.conv1d_banks.1.conv1d.weight: copying a param with shape torch.Size([128, 80, 2]) from checkpoint, the shape in current model is torch.Size([128, 2, 2]).
        size mismatch for postnet.cbhg.conv1d_banks.2.conv1d.weight: copying a param with shape torch.Size([128, 80, 3]) from checkpoint, the shape in current model is torch.Size([128, 2, 3]).
        size mismatch for postnet.cbhg.conv1d_banks.3.conv1d.weight: copying a param with shape torch.Size([128, 80, 4]) from checkpoint, the shape in current model is torch.Size([128, 2, 4]).
        size mismatch for postnet.cbhg.conv1d_banks.4.conv1d.weight: copying a param with shape torch.Size([128, 80, 5]) from checkpoint, the shape in current model is torch.Size([128, 2, 5]).
        size mismatch for postnet.cbhg.conv1d_banks.5.conv1d.weight: copying a param with shape torch.Size([128, 80, 6]) from checkpoint, the shape in current model is torch.Size([128, 2, 6]).
        size mismatch for postnet.cbhg.conv1d_banks.6.conv1d.weight: copying a param with shape torch.Size([128, 80, 7]) from checkpoint, the shape in current model is torch.Size([128, 2, 7]).
        size mismatch for postnet.cbhg.conv1d_banks.7.conv1d.weight: copying a param with shape torch.Size([128, 80, 8]) from checkpoint, the shape in current model is torch.Size([128, 2, 8]).
        size mismatch for postnet.cbhg.conv1d_projections.1.conv1d.weight: copying a param with shape torch.Size([80, 256, 3]) from checkpoint, the shape in current model is torch.Size([2, 256, 3]).
        size mismatch for postnet.cbhg.conv1d_projections.1.bn.weight: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([2]).
        size mismatch for postnet.cbhg.conv1d_projections.1.bn.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([2]).
        size mismatch for postnet.cbhg.conv1d_projections.1.bn.running_mean: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([2]).
        size mismatch for postnet.cbhg.conv1d_projections.1.bn.running_var: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([2]).
        size mismatch for postnet.cbhg.pre_highway.weight: copying a param with shape torch.Size([128, 80]) from checkpoint, the shape in current model is torch.Size([128, 2]).
        size mismatch for last_linear.0.weight: copying a param with shape torch.Size([1025, 256]) from checkpoint, the shape in current model is torch.Size([80, 256]).
        size mismatch for last_linear.0.bias: copying a param with shape torch.Size([1025]) from checkpoint, the shape in current model is torch.Size([80]).
nmstoker commented 5 years ago

EDIT: actually there may be more to do... this doesn't cause it to crash but the output is jibberish :slightly_smiling_face:

It's the line in synthesizer.py which isn't updated: https://github.com/mozilla/TTS/blob/db7f3d36e7768f9179d42a8f19b88c2c736d87eb/server/synthesizer.py#L26

If you replace L26 with:

        num_chars = 61
        self.model = Tacotron(num_chars, config.embedding_size, config.audio['num_freq'], config.audio['num_mels'], config.r, attn_windowing=False)

It should then work (NB: the 61 is hard-coded for simplicity, but you can refer to the line in Benchmark.ipynb for how it should work, but then you need to bring in other variables too)

vitaly-zdanevich commented 5 years ago

I tried the fix above from @nmstoker - looks like finally server is started but not accessible

$ python3 server/server.py -c server/conf.json
/home/vitaly_zdanevich/.local/lib/python3.6/site-packages/numba/errors.py:105: UserWarning: Insufficiently recent colorama version found. Numba requires colorama >= 0.3.9
  warnings.warn(msg)
 > Loading model ...
 | > model config:  /home/vitaly_zdanevich/TTS/config.json
 | > model file:  /home/vitaly_zdanevich/TTS/best_model.pth.tar
 > Setting up Audio Processor...
 | > fft size: 2048, hop length: 275, win length: 1102
 | > Audio Processor attributes.
   | > bits:None
   | > sample_rate:22050
   | > num_mels:80
   | > min_level_db:-100
   | > frame_shift_ms:12.5
   | > frame_length_ms:50
   | > ref_level_db:20
   | > num_freq:1025
   | > power:1.5
   | > preemphasis:0.98
   | > griffin_lim_iters:60
   | > signal_norm:True
   | > symmetric_norm:False
   | > mel_fmin:0
   | > mel_fmax:None
   | > max_norm:1.0
   | > clip_norm:True
   | > do_trim_silence:True
   | > n_fft:2048
   | > hop_length:275
   | > win_length:1102
 | > Number of characters : 61
 * Serving Flask app "server" (lazy loading)
 * Environment: production
   WARNING: Do not use the development server in a production environment.
   Use a production WSGI server instead.
 * Debug mode: on
INFO:werkzeug: * Running on http://0.0.0.0:8000/ (Press CTRL+C to quit)
INFO:werkzeug: * Restarting with stat
/home/vitaly_zdanevich/.local/lib/python3.6/site-packages/numba/errors.py:105: UserWarning: Insufficiently recent colorama version found. Numba requires colorama >= 0.3.9
  warnings.warn(msg)
 > Loading model ...
 | > model config:  /home/vitaly_zdanevich/TTS/config.json
 | > model file:  /home/vitaly_zdanevich/TTS/best_model.pth.tar
 > Setting up Audio Processor...
 | > fft size: 2048, hop length: 275, win length: 1102
 | > Audio Processor attributes.
   | > bits:None
   | > sample_rate:22050
   | > num_mels:80
   | > min_level_db:-100
   | > frame_shift_ms:12.5
   | > frame_length_ms:50
   | > ref_level_db:20
   | > num_freq:1025
   | > power:1.5
   | > preemphasis:0.98
   | > griffin_lim_iters:60
   | > signal_norm:True
   | > symmetric_norm:False
   | > mel_fmin:0
   | > mel_fmax:None
   | > max_norm:1.0
   | > clip_norm:True
   | > do_trim_silence:True
   | > n_fft:2048
   | > hop_length:275
   | > win_length:1102
 | > Number of characters : 61
WARNING:werkzeug: * Debugger is active!
INFO:werkzeug: * Debugger PIN: 220-380-468

To fetch page I tried browser and curl:

$ curl http://35.230.162.189:8000
curl: (7) Failed to connect to 35.230.162.189 port 8000: Connection timed out
erogol commented 5 years ago

check the default port in config file

vitaly-zdanevich commented 5 years ago

Yes before run in my config I changed port from 5002 to 8000 - like in python3 -m http.server.

vitaly-zdanevich commented 5 years ago

Sorry my fail, curl localhost:8000 returns some HTML...

vitaly-zdanevich commented 5 years ago

In my browser I see the page from your HTTP server, I tried string hello and got some strange sound as output, I tried another string and got INTERNAL SERVER ERROR:

INFO:werkzeug: * Debugger PIN: 220-380-468
INFO:werkzeug:127.0.0.1 - - [16/Apr/2019 17:00:13] "GET / HTTP/1.1" 200 -
INFO:werkzeug:46.216.60.228 - - [16/Apr/2019 17:01:31] "GET / HTTP/1.1" 200 -
INFO:werkzeug:46.216.60.228 - - [16/Apr/2019 17:01:33] "GET /favicon.ico HTTP/1.1" 404 -
 > Model input: hello
hello.
INFO:werkzeug:46.216.60.228 - - [16/Apr/2019 17:01:42] "GET /api/tts?text=hello HTTP/1.1" 200 -
 > Model input: life is just a simple game
life is just a simple game.
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCTensorMath.cu line=238 error=59 : device-side assert triggered
INFO:werkzeug:46.216.60.228 - - [16/Apr/2019 17:02:13] "GET /api/tts?text=life%20is%20just%20a%20simple%20game HTTP/1.1" 500 -
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/Flask-1.0.2-py3.6.egg/flask/app.py", line 2309, in __call__
    return self.wsgi_app(environ, start_response)
  File "/usr/local/lib/python3.6/dist-packages/Flask-1.0.2-py3.6.egg/flask/app.py", line 2295, in wsgi_app
    response = self.handle_exception(e)
  File "/usr/local/lib/python3.6/dist-packages/Flask-1.0.2-py3.6.egg/flask/app.py", line 1741, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.6/dist-packages/Flask-1.0.2-py3.6.egg/flask/_compat.py", line 35, in reraise
    raise value
  File "/usr/local/lib/python3.6/dist-packages/Flask-1.0.2-py3.6.egg/flask/app.py", line 2292, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.6/dist-packages/Flask-1.0.2-py3.6.egg/flask/app.py", line 1815, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.6/dist-packages/Flask-1.0.2-py3.6.egg/flask/app.py", line 1718, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.6/dist-packages/Flask-1.0.2-py3.6.egg/flask/_compat.py", line 35, in reraise
    raise value
  File "/usr/local/lib/python3.6/dist-packages/Flask-1.0.2-py3.6.egg/flask/app.py", line 1813, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.6/dist-packages/Flask-1.0.2-py3.6.egg/flask/app.py", line 1799, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/vitaly_zdanevich/TTS/server/server.py", line 28, in tts
    data = synthesizer.tts(text)
  File "/home/vitaly_zdanevich/TTS/server/synthesizer.py", line 63, in tts
    chars_var)
  File "/home/vitaly_zdanevich/TTS/models/tacotron.py", line 37, in forward
    encoder_outputs = self.encoder(inputs)
  File "/home/vitaly_zdanevich/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/vitaly_zdanevich/TTS/layers/tacotron.py", line 275, in forward
    return self.cbhg(inputs)
  File "/home/vitaly_zdanevich/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/vitaly_zdanevich/TTS/layers/tacotron.py", line 254, in forward
    return self.cbhg(x)
  File "/home/vitaly_zdanevich/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/vitaly_zdanevich/TTS/layers/tacotron.py", line 218, in forward
    x = torch.cat(outs, dim=1)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/generic/THCTensorMath.cu:238
nmstoker commented 5 years ago

That sounds like might be the jibberish I mentioned in my edit. I'm not in front of a computer at the moment, but I think the points @erogol made in #154 may help.