vtuber-plan / hifi-gan

An High-resolution implementation of HiFi-GAN Vocoder for Voice Conversion.
MIT License
29 stars 1 forks source link

TypeError: 'dict' object is not callable #1

Open dillfrescott opened 1 year ago

dillfrescott commented 1 year ago
      7 mel = mel_spectrogram_torch(wav, 2048, 256, 48000, 512, 2048, 0, None, False)
      8 mel = mel.cuda()
----> 9 out = hifigan(mel)
     10 
     11 wav_out = out.squeeze(0).cpu()

TypeError: 'dict' object is not callable
dillfrescott commented 1 year ago

Also, I get this error when trying to just load your model from hub:

Using cache found in /root/.cache/torch/hub/vtuber-plan_hifi-gan_main
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-31-3ed915ea688c>](https://localhost:8080/#) in <module>
      1 import torch, torchaudio
      2 from hifigan.mel_processing import mel_spectrogram_torch
----> 3 hifigan = torch.hub.load("vtuber-plan/hifi-gan:main", "hifigan_48k")
      4 wav, sr = torchaudio.load("rec.wav")
      5 assert sr == 48000

3 frames
[/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in load_state_dict(self, state_dict, strict)
   1603         if len(error_msgs) > 0:
   1604             raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
-> 1605                                self.__class__.__name__, "\n\t".join(error_msgs)))
   1606         return _IncompatibleKeys(missing_keys, unexpected_keys)
   1607 

RuntimeError: Error(s) in loading state_dict for Generator:
    size mismatch for conv_pre.weight: copying a param with shape torch.Size([512, 256, 7]) from checkpoint, the shape in current model is torch.Size([512, 128, 7]).
FrostMiKu commented 1 year ago

Hi @dillfrescott , the main brach is still under development, we'll let you know when we release an available version.

dillfrescott commented 1 year ago

Oh, okay!

jstzwj commented 1 year ago

I just uploaded the weights and code for version 0.2.1, the example in the readme works now. And the network is still in training, the output of this version may not be high quality.

dillfrescott commented 1 year ago

okay!

dillfrescott commented 1 year ago

Quick question. Could I use this for singing voice conversion?

jstzwj commented 1 year ago

We did take the singing voice conversion task into account, so a part of training dataset are singing voice, such as JSUT-Song and RAVDESS-Song. It is possible to put it into a singing voice conversion model. In future versions, we may add more singing data to fit the task better.

dillfrescott commented 1 year ago

Oh cool! Is it okay if I go ahead and try to run inference? Or is it not going to be ready?

jstzwj commented 1 year ago

This repository is the code of vocoder, the singing voice conversion model is in the vcvits repository and the model is still in development.

dillfrescott commented 1 year ago

Oh okay