Closed erogol closed 4 years ago
@kan-bayashi No, i use this https://github.com/Xilinx/brevitas (need to expands input to 4D to training), this frame work just support conv2D :D. but it's enough :D.I also have a larger version of melgan generator then i convert it to tensorflow and use tensorrt to optimize it on server side :D. Seem all we need to do now is enhance the quality :)), the speed isn't a problem anymore :D
It's been a great honor to follow you and other researchers' work on this repository. :) I am interested in the light version of the voice synthesis model, ideally on a mobile platform. It seems that you have successfully converted the model into the mobile version. Could you kindly share the model or at least tell us about the performance of it? Thanks in advance.
@John-K92 Please check demo HP and results in README. https://kan-bayashi.github.io/ParallelWaveGAN https://github.com/kan-bayashi/ParallelWaveGAN#results You can access the sample and model file.
MelGAN is very light while keeping comparable performance. If you want to use in the mobile, the conversion notebook from pytorch to tensorflow will help you.
@John-K92 the performance i'm using for mobile device is on par with original melgan. (still use float model so the performance is same).
@John-K92 the performance i'm using for mobile device is on par with original melgan. (still use float model so the performance is same).
Thank you for sharing your experience. Then, may I ask your opinion on TTS(text-to-mel) model, such as Fast speech, Tacotron, etc., that how it could be applicable in mobile platforms along with the MelGAN? As mentioned by @kan-bayashi and in your research, Mel-2-voice model seems to be light enough to be applied for the mobile device. However, would both models be light enough on mobile in an end-to-end TTS manner? Or would you reckon to put it in a different structure or process?
@John-K92 i just want to say that mel-2-voice is light enough but it is still slower than my text2mel models :D. My text2mel model is fully convolution, deploy rnn is very hard on mobile device so tacotron should be ignore :D
@John-K92 Please check demo HP and results in README. https://kan-bayashi.github.io/ParallelWaveGAN https://github.com/kan-bayashi/ParallelWaveGAN#results You can access the sample and model file.
MelGAN is very light while keeping comparable performance. If you want to use in the mobile, the conversion notebook from pytorch to tensorflow will help you.
I am currently trying to convert the model to tensorflow and to mobile-wise version(tensorrt or tensorflow lite) https://colab.research.google.com/github/kan-bayashi/ParallelWaveGAN/blob/master/notebooks/convert_melgan_from_pytorch_to_tensorflow.ipynb#scrollTo=XOK6AuWW9R8N&line=3&uniqifier=1 But there seems to be an error in the code you've provided. Have you set a specific tensorflow version??
what error ?
what error ?
When I run the code, "audio = TFMelGANGenerator(**config["generator_params"])(inputs)" line raises an InaccessibleTensorError.
InaccessibleTensorError: The tensor 'Tensor("conv2d_346/dilation_rate:0", shape=(2,), dtype=int32)' cannot be accessed here: it is defined in another function or code block. Use return values, explicit Python locals or TensorFlow collections to access it. Defined in: FuncGraph(name=call, id=140430667155440); accessed from: FuncGraph(name=call, id=140430364789896).
Do you not meet any error on your side?
I observed a interesting behaviour after 138K iters where discriminator dominated the training and generator exploded in both train and validation losses. Do you have any idea why and how to prevent it?
I am training on LJSpeech and I basically use the same learning schedule you released with the v2 config for LJSpeech. (Train generator until 100K and enable the discriminator)
Here is the tensorboard screenshot.