Closed ishandutta2007 closed 5 years ago
@ishandutta2007 Hi, I guess it is caused by the librosa version. You can modify how to write wave with your environment.
Thanks a lot @syang1993 for answering, I have been trying to reach out to you on multiple platforms for help on this thread for models that people have already built. Not sure people look at older threads. It would be great if you could share atleast the 200k steps(that you shared outputs of) model for us to continue more iterations on top of that.
Hi, I'm so sorry that I'm now doing an internship in a company, I cannot get the pre-trained model (I trained it several months ago when I was doing visiting research in Singapore). You can train it by yourself, it may take about 3 days to get 200K steps.
Well on our gtx 1080 as per my estimate it's taking longer(maybe twice of that). And it is also not always about time, now a days people in the ML world are burning huge amount of compute hours and money unnecessary when sharing can solve it a lot. Can you share your email/linkedin/twitter etc , you seem to be really deep into Speech Synthesis, keeping in touch may be useful for both of us.
I trained it on P40, which may be faster. Yes you are right, sharing can solve a lot. Maybe this is the purpose of Github :)
I'm not so familiar with linkedin so that I don't know how to share my id, this is the link https://www.linkedin.com/in/yang-shan-182987119
So in China do you use Ushi or Mamai ? Let's see if I can connect via them too. :)
Thanks @syang1993 for the connect. I have triggered the run on our gtx 1080, it would take a month or so to get 500-600 iterations. We need to get it right close to google's performance or else it is unusable for real life scenarios. If you have access to more powerful GPUs, It would be a great favour if you could do a train for larger iterations and share the model with the community. Till now there is no properly trained tacotron with style transfer on internet, this will be the first one.
@ishandutta2007 Usually, google tends to use a lot of GPUs to train such a model. And they use about 200 hours data to get their performance. So I think it's hard to reconstruct their performance. By the way, one of my friend begins to train a model use this repo, I can share it when it's finished.
No wonder why Elon Musk fears of Google colonising the world :D
@syang1993 what's the best communicator/instant messager to keep in touch with you, we shouldn't be discussing stuff not related to the thread, I will switch over to the models thread for further updates on this.
Do let me know what's best to reach you. Don't hesitate even if I need to install wechat or something. In India we use:
@ishandutta2007 We mostly use wechat in China, and my wechat id is ys_think . I also use linkedin (not usual) and gmail: syang.mix@gmail.com
just hit the same error. @ishandutta2007 how did you get around this?
i solved it by changing util / audio.py / save_wav() :
librosa.output.write_wav(path, wav.astype(np.int16), hparams.sample_rate)
to
librosa.output.write_wav(path, wav, hparams.sample_rate)
Running on LJ dataset. Basically this is the line where it's breaking https://github.com/syang1993/gst-tacotron/blob/b455ed21bf0c08e557dde0aaafaf40a1b4df5265/train.py#L115