voicepaw / so-vits-svc-fork

so-vits-svc fork with realtime support, improved interface and more features.
Other
8.78k stars 1.18k forks source link

Zero progress at training #1090

Open sapozhnikov opened 8 months ago

sapozhnikov commented 8 months ago

Describe the bug

Trying to run SVC locally and get GPU acceleration from Radeon 5700XT. During training progress stuck at the beginning, GPU run at 100%, but only get 'Epoch 0/9999' hours after.

To Reproduce

Installed to fresh Conda environment, like described. Python 3.10.13

python -m pip install -U pip setuptools wheel
pip install -U torch torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.6
pip install -U so-vits-svc-fork

Additional context

radeontop showing 100% at 'Graphics pipe' & 'Shader Interpolator' bars during hubert and train stages. Tried different versions of pytorch. Same behavior with latest. Older versions fails to open model, i think.

Added to ~/.bashrc

export ROCM_PATH=/opt/rocm export HSA_OVERRIDE_GFX_VERSION=10.3.0 export PYTORCH_ROCM_ARCH="gfx1010"

System: Kernel: 6.7.4-arch1-1 arch: x86_64 bits: 64 compiler: gcc v: 13.2.1 clocksource: tsc Desktop: Cinnamon v: 6.0.4 tk: GTK v: 3.24.41 wm: Muffin v: 6.0.1 vt: 7 dm: LightDM v: 1.32.0 Distro: EndeavourOS base: Arch Linux CPU: Info: 14-core model: Intel Xeon E5-2680 v4 bits: 64 type: MT MCP smt: enabled arch: Broadwell rev: 1 cache: L1: 896 KiB L2: 3.5 MiB L3: 35 MiB Graphics: Device-1: AMD Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] driver: amdgpu v: kernel arch: RDNA-1 pcie: speed: 16 GT/s lanes: 16 ports: active: HDMI-A-1 empty: DP-1,DP-2,DP-3 bus-ID: 05:00.0 chip-ID: 1002:731f class-ID: 0300 Info: Memory: total: 32 GiB note: est. available: 31.2 GiB used: 3.16 GiB (10.1%)

svc output:

22:59:42] INFO [22:59:42] Using strategy: auto train.py:98 INFO: GPU available: True (cuda), used: True INFO [22:59:42] GPU available: True (cuda), used: True rank_zero.py:64 INFO: TPU available: False, using: 0 TPU cores INFO [22:59:42] TPU available: False, using: 0 TPU cores rank_zero.py:64 INFO: IPU available: False, using: 0 IPUs INFO [22:59:42] IPU available: False, using: 0 IPUs rank_zero.py:64 INFO: HPU available: False, using: 0 HPUs INFO [22:59:42] HPU available: False, using: 0 HPUs rank_zero.py:64 WARNING [22:59:42] warnings.py:109 /home/user01/miniconda3/envs/sovits/lib/python3.10/site-packages/so_vits_svc_fork/modules/synthesizers.py:8
1: UserWarning: Unused arguments: {'n_layers_q': 3, 'use_spectral_norm': False, 'pretrained': {'D_0.pth':
'https://huggingface.co/datasets/ms903/sovits4.0-768vec-layer12/resolve/main/sovits_768l12_pre_large_320k/c
lean_D_320000.pth', 'G_0.pth':
'https://huggingface.co/datasets/ms903/sovits4.0-768vec-layer12/resolve/main/sovits_768l12_pre_large_320k/c
lean_G_320000.pth'}}
warnings.warn(f"Unused arguments: {kwargs}")

       INFO     [22:59:42] Decoder type: hifi-gan                                                                       synthesizers.py:100
       WARNING  [22:59:42]                                                                                                  warnings.py:109
                /home/user01/miniconda3/envs/sovits/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:28:                         
                UserWarning: torch.nn.utils.weight_norm is deprecated in favor of                                                          
                torch.nn.utils.parametrizations.weight_norm.                                                                               
                  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of                                                      
                torch.nn.utils.parametrizations.weight_norm.")

[22:59:44] WARNING [22:59:44] /home/user01/miniconda3/envs/sovits/lib/python3.10/site-packages/so_vits_svc_fork/utils.py:246: warnings.py:109 UserWarning: Keys not found in checkpoint state dict:['emb_g.weight']
warnings.warn(f"Keys not found in checkpoint state dict:" f"{not_in_from}")

       WARNING  [22:59:44] /home/user01/miniconda3/envs/sovits/lib/python3.10/site-packages/so_vits_svc_fork/utils.py:264:  warnings.py:109
                UserWarning: Shape mismatch: ['dec.cond.weight: torch.Size([512, 256, 1]) -> torch.Size([512, 768, 1])',                   
                'enc_q.enc.cond_layer.weight_v: torch.Size([6144, 256, 1]) -> torch.Size([6144, 768, 1])',                                 
                'flow.flows.0.enc.cond_layer.weight_v: torch.Size([1536, 256, 1]) -> torch.Size([1536, 768, 1])',                          
                'flow.flows.2.enc.cond_layer.weight_v: torch.Size([1536, 256, 1]) -> torch.Size([1536, 768, 1])',                          
                'flow.flows.4.enc.cond_layer.weight_v: torch.Size([1536, 256, 1]) -> torch.Size([1536, 768, 1])',                          
                'flow.flows.6.enc.cond_layer.weight_v: torch.Size([1536, 256, 1]) -> torch.Size([1536, 768, 1])',                          
                'f0_decoder.cond.weight: torch.Size([192, 256, 1]) -> torch.Size([192, 768, 1])']                                          
                  warnings.warn(                                                                                                           

       INFO     [22:59:44] Loaded checkpoint 'logs/44k/G_0.pth' (epoch 0)                                                      utils.py:307
       INFO     [22:59:44] Loaded checkpoint 'logs/44k/D_0.pth' (epoch 0)                                                      utils.py:307

INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] INFO [22:59:44] LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] cuda.py:61 ┏━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓ ┃ ┃ Name ┃ Type ┃ Params ┃ ┡━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩ │ 0 │ net_g │ SynthesizerTrn │ 45.6 M │ │ 1 │ netd │ MultiPeriodDiscriminator │ 46.7 M │ └───┴───────┴──────────────────────────┴────────┘ Trainable params: 92.4 M
Non-trainable params: 0
Total params: 92.4 M
Total estimated model params size (MB): 369 WARNING [22:59:44] warnings.py:109 /home/user01/miniconda3/envs/sovits/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data

connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider
increasing the value of the num_workers argumenttonum_workers=27in theDataLoader` to improve
performance.

[22:59:45] INFO [22:59:45] Setting current epoch to 0 train.py:311 INFO [22:59:45] Setting total batch idx to 0 train.py:327 INFO [22:59:45] Setting global step to 0 train.py:317 Epoch 0/9999 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/156 0:00:00 • -:--:-- 0.00it/s v_num: 0.000

Version

4.1.47

Platform

EndeavourOS (Arch Linux)

Code of Conduct

No Duplicate

sapozhnikov commented 8 months ago

Well, it looks like pytorch doesn't work with my GPU, last compatible version of ROCm was 5.2, which doesn't work with SVC and produce

INFO [02:10:15] Decoder type: hifi-gan synthesizers.py:100 Traceback (most recent call last): File "/home/user01/miniconda3/envs/sovits/lib/python3.10/site-packages/so_vits_svcfork/train.py", line 347, in load , , , epoch = utils.load_checkpoint( File "/home/user01/miniconda3/envs/sovits/lib/python3.10/site-packages/so_vits_svc_fork/utils.py", line 288, in load_checkpoint checkpoint_dict = torch.load(f, map_location="cpu", weights_only=True) File "/home/user01/miniconda3/envs/sovits/lib/python3.10/site-packages/torch/serialization.py", line 809, in load raise pickle.UnpicklingError(UNSAFE_MESSAGE + str(e)) from None _pickle.UnpicklingError: Weights only load failed. Re-running torch.load with weights_only set to False will likely succeed, but it can result in arbitrary code execution.Do it only if you get the file from a trusted source. WeightsUnpickler error: Unsupported operand 71

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/user01/miniconda3/envs/sovits/bin/svc", line 8, in sys.exit(cli()) File "/home/user01/miniconda3/envs/sovits/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/home/user01/miniconda3/envs/sovits/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/home/user01/miniconda3/envs/sovits/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/user01/miniconda3/envs/sovits/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/user01/miniconda3/envs/sovits/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(args, kwargs) File "/home/user01/miniconda3/envs/sovits/lib/python3.10/site-packages/so_vits_svc_fork/main.py", line 128, in train train( File "/home/user01/miniconda3/envs/sovits/lib/python3.10/site-packages/so_vits_svc_fork/train.py", line 119, in train model = VitsLightning(reset_optimizer=reset_optimizer, hparams) File "/home/user01/miniconda3/envs/sovits/lib/python3.10/site-packages/so_vits_svc_fork/train.py", line 186, in init self.load(reset_optimizer) File "/home/user01/miniconda3/envs/sovits/lib/python3.10/site-packages/so_vits_svc_fork/train.py", line 363, in load raise RuntimeError("Failed to load checkpoint") from e RuntimeError: Failed to load checkpoint