voicepaw / so-vits-svc-fork

so-vits-svc fork with realtime support, improved interface and more features.
Other
8.8k stars 1.18k forks source link

"RuntimeError: Failed to load checkpoint" error occurs when using "svc train" #584

Open Nullthingxs opened 1 year ago

Nullthingxs commented 1 year ago
root@nvidia2:/data/so-vits-svc# svc train
[14:48:51] INFO     [14:48:51] Created a temporary directory at /tmp/tmpogn07duj                                                                                                                                                                                                                       instantiator.py:21
           INFO     [14:48:51] Writing /tmp/tmpogn07duj/_remote_module_non_scriptable.py                                                                                                                                                                                                               instantiator.py:76
[14:48:53] INFO     [14:48:53] Using strategy: auto                                                                                                                                                                                                                                                         train.py:88
INFO: GPU available: True (cuda), used: True
           INFO     [14:48:53] GPU available: True (cuda), used: True                                                                                                                                                                                                                                        setup.py:163
INFO: TPU available: False, using: 0 TPU cores
           INFO     [14:48:53] TPU available: False, using: 0 TPU cores                                                                                                                                                                                                                                      setup.py:166
INFO: IPU available: False, using: 0 IPUs
           INFO     [14:48:53] IPU available: False, using: 0 IPUs                                                                                                                                                                                                                                           setup.py:169
INFO: HPU available: False, using: 0 HPUs
           INFO     [14:48:53] HPU available: False, using: 0 HPUs                                                                                                                                                                                                                                           setup.py:172
           WARNING  [14:48:53] /root/anaconda3/envs/py390/lib/python3.9/site-packages/so_vits_svc_fork/modules/synthesizers.py:81: UserWarning: Unused arguments: {'n_layers_q': 3, 'use_spectral_norm': False}                                                                                           warnings.py:109
                      warnings.warn(f"Unused arguments: {kwargs}")                                                                                                                                                                                                                                                       

           INFO     [14:48:53] Decoder type: hifi-gan                                                                                                                                                                                                                                                 synthesizers.py:100
Traceback (most recent call last):
  File "/root/anaconda3/envs/py390/lib/python3.9/site-packages/so_vits_svc_fork/train.py", line 336, in load
    _, _, _, epoch = utils.load_checkpoint(
  File "/root/anaconda3/envs/py390/lib/python3.9/site-packages/so_vits_svc_fork/utils.py", line 240, in load_checkpoint
    checkpoint_dict = torch.load(f, map_location="cpu", weights_only=True)
  File "/root/anaconda3/envs/py390/lib/python3.9/site-packages/torch/serialization.py", line 788, in load
    raise pickle.UnpicklingError(UNSAFE_MESSAGE + str(e)) from None
_pickle.UnpicklingError: Weights only load failed. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution.Do it only if you get the file from a trusted source. WeightsUnpickler error: Unsupported operand 71

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/root/anaconda3/envs/py390/bin/svc", line 8, in <module>
    sys.exit(cli())
  File "/root/anaconda3/envs/py390/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/root/anaconda3/envs/py390/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/root/anaconda3/envs/py390/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/root/anaconda3/envs/py390/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/root/anaconda3/envs/py390/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/root/anaconda3/envs/py390/lib/python3.9/site-packages/so_vits_svc_fork/__main__.py", line 129, in train
    train(
  File "/root/anaconda3/envs/py390/lib/python3.9/site-packages/so_vits_svc_fork/train.py", line 108, in train
    model = VitsLightning(reset_optimizer=reset_optimizer, **hparams)
  File "/root/anaconda3/envs/py390/lib/python3.9/site-packages/so_vits_svc_fork/train.py", line 175, in __init__
    self.load(reset_optimizer)
  File "/root/anaconda3/envs/py390/lib/python3.9/site-packages/so_vits_svc_fork/train.py", line 352, in load
    raise RuntimeError("Failed to load checkpoint") from e
RuntimeError: Failed to load checkpoint
root@nvidia2:/data/so-vits-svc# 
mattopeerenboom commented 1 year ago

Some kind of library mismatch. I found that reinstalling (torch and sovits) fixed it.

Nullthingxs commented 1 year ago

Some kind of library mismatch. I found that reinstalling (torch and sovits) fixed it.

I tried reinstalling, but that didn't alleviate the problem....

Nullthingxs commented 1 year ago

Some kind of library mismatch. I found that reinstalling (torch and sovits) fixed it.

from #410 I found a way to fix it

sonhm3029 commented 1 year ago

Do you find the way to resolve this issue

GeoSynchron commented 1 year ago

I found the file that caused the problem. I had previously trained models with 4 different voices and had no problems. I deleted the last checkpoints files D_XXX, G_XXX and then started training from the last checkpoint. At this point, the training has already passed the problem checkpoint, in my case it was file 1667, with the checkpoint at 1666. The reasons for this are not clear to me, but the D_XXX file was 10 times smaller than the previous checkpoints, and this caused suspicions.

34j commented 1 year ago

198