Open LearnedVector opened 6 years ago
@MXGray: would you be willing to share your pre-trained model on the Nancy corpus?
@keithito @Mn0491
No problem - Here you go: https://github.com/keithito/tacotron/issues/15#issuecomment-342632496
@MXGray Thank you so much!
The one that you linked to doesn't seem to perform as well as the on https://keithito.github.io/audio-samples/. Are they the same model? Here is a sample of what the model outputted
This is me using the default demo-server.py and pointing the checkpoint to the nancy corpus model you provided.
Thanks again for posting the link, and thank you @keithito for this awesome project.
@Mn0491
Oh, my bad - That's the Tagalog model that I trained on top of the Nancy model. I'll upload the correct Nancy model later tonight when I get in front of my laptop and will post it here.
@MXGray could you, please, upload the English model. Model in your drive reference ref doesn't generate english speech.
@Mn0491 @geneing Sorry guys, crazy holidays. :) Here you go - Happy 2018! https://drive.google.com/file/d/1c_O-Gha03_erKbilsFCvs9QJ8faJ7ou8/view?usp=sharing
@MXGray thank you! Happy 2018!
@MXGray Thank you. It works now.
@t3t3t3 Are you training a model on top of this? If so, then after preprocessing your training data, you'll get max output length. Divide this with output per step, and use that for max iters in hparams.py ... Hope this helps. :)
Thanks! I think I messed up the parameters somewhere. I reinstall the clean source code and it works now!
Excuse me. Can you tell me how do I download the Nancy Corpus dataset from blizzard 2011 on CSTR. I cannot even find the entry for register. @MXGray @keithito
@begeekmyfriend Hello! You need to click "license", it'll redirect to this page, fill in the form and wait few days for it to be approved. You'll get 2 emails total
http://www.cstr.ed.ac.uk/projects/blizzard/2011/lessac_blizzard2011/license.html
@gloriouskilka Thanks a lot. Nancy Corpus dataset seems better for training as English language.
@MXGray Thanks! Do you mind if I upload your trained model to a public GitHub repository? I'd like to make a Docker container for running Tacotron. (curl doesn't play well with Google Drive links)
@ArkaneCow No, I don't mind. It'll be very helpful. Please share the link here once it's up. Thanks. :)
@MXGray Thanks! I created a Docker for running this repository here: https://hub.docker.com/r/arkanecow/dockerfile-keithito-tacotron/ The Docker file is here: https://github.com/ArkaneCow/dockerfile-keithito-tacotron The repository where the models are hosted is here: https://github.com/ArkaneCow/tacotron-models
@MXGray Could you please clarify, did you use the default parameters for training on Nancy corpus? Thanks in advance!
@MXGray Thanks for your contribution. I listened the demo audio with LJSpeech and Nancy both, found that Nancy is better.
I download files from the official Nancy website you provided, but I do not know how to handle those files. Therefore, I downloaded the wav and text from here:
https://github.com/barronalex/Tacotron/blob/master/download_data.sh
I changed the contents format in prompts.data into metadata.csv, such as: APDC2-017-01|Children act prematurely.|Children act prematurely.
However, I got an error during running train.py:
Starting new training run at commit: None Generated 32 batches of size 32 in 2.455 sec Traceback (most recent call last): File "/home/chris/new_2018/u41_2_nancy/tacotron/tacotron/datasets/datafeeder.py", line 74, in run self._enqueue_next_group() File "/home/chris/new_2018/u41_2_nancy/tacotron/tacotron/datasets/datafeeder.py", line 96, in _enqueue_next_group self._session.run(self._enqueue_op, feed_dict=feed_dict) File "/home/chris/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 889, in run run_metadata_ptr) File "/home/chris/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1096, in _run % (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape()))) ValueError: Cannot feed value of shape (32, 665, 64) for Tensor 'datafeeder/mel_targets:0', which has shape '(?, ?, 80)'
Is there anything I should edit in the original code written by @keithito, as the different between ljspeech and nancy corpus?
@Quadraaa @DavidAksnes
All default parameters in hparams.py, except max_iters. This value should be set to max output length divided by output per step. For example - After preprocessing the Nancy dataset, let's say you get 1605 as the value of max output length; and Default output per step is 5, so:
1605 / 5 = 321 (this should be the value of max_iters)
Hope this helps!
Hi @MXGray,
I've been training off the Nancy Corpus as you did, using the default parameters. I've synthesises to 233000 steps, but whenever I synthesise a sentence it has lots of echoing afterwards, whereas your model doesn't generate echoes. I was wondering if you had any suggestions how to fix this? Find an example below using the prompt "This is the example recording." c9aeb4d2-b929-4c46-bdef-d644a36280f3.wav.zip
Thanks
Hi @MXGray, Thank you for sharing such a perfect trained model. I used your model as a pretrained model for synthesizing in my own language. My data is about 2.2 h with single speaker and most of phonemes are mapped like CMUDict phonemes. After 950K iters, the model can only align first half of middle and long sentences. The other half is so bad and it seems that the model can not learn to align second part of the sentences. Why this happens? How can I fix this?
Thanks
@keithito Do you have any idea about mentioned problem of mine?
@navidnadery I'm not sure why this would happen. Maybe there's a lack of long sentences in your training data? You can also try with Location Sensitive Attention (or hybrid attention) to see if that yields a better result.
Hi guys!! Does anyone know if the sample_rate in hparams.py need to be changed to the sampling frequency of the sound files in the dataset? Like 16K for the Nancy dataset?
@Shikherneo2 you mean sample_rate?
It seems that with the new updates on the code it cannot be used with Nancy pre-trained model. Does anyone have a similar issue?
I think you could not use a checkpoint when hparams are changed.
Can anyone share some part of code or some directives to how I can use this pre-trained model ?(using the same code as the one shared in this repo).
I tried running this :
!python3 eval.py --checkpoint nancy_model/model.ckpt-250000
but I get the following error : [...] NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
Key model/inference/decoder/output_projection_wrapper/multi_rnn_cell/cell_0/output_projection_wrapper/concat_output_and_attention_wrapper/decoder_prenet_wrapper/attention_wrapper/bahdanau_attention/attention_v not found in checkpoint [[node save/RestoreV2 (defined at /home/ec2-user/SageMaker/tacotron/synthesizer.py:24) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
Is the nancy corpus pre-trained model available anywhere for use? I think the one provided in the README is LJ Speech trained model