No way to select which GPU to run on. Runs on none of them.

PlateGlassArmour commented 1 year ago

I'm trying to train a model, and everything else seems to be working, but I haven't found any way to select which of my two GPU's to use, and it seems to refuse to use either one for training.

I run Windows 10 pro, and I have a GTX3060 and a GTX3090 installed.

When I run "nvcc --version" I get this. (base) PS C:\WINDOWS\system32> nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Mon_Apr__3_17:36:15_Pacific_Daylight_Time_2023 Cuda compilation tools, release 12.1, V12.1.105 Build cuda_12.1.r12.1/compiler.32688072_0

And when I run "nvidia-smi" I get this. Wed May 10 15:53:07 2023 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 531.14 Driver Version: 531.14 CUDA Version: 12.1 | |-----------------------------------------+----------------------+----------------------+ | GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3090 WDDM | 00000000:25:00.0 Off | N/A | | 0% 38C P8 6W / 350W| 0MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 1 NVIDIA GeForce RTX 3060 WDDM | 00000000:26:00.0 On | N/A | | 0% 52C P8 17W / 170W| 325MiB / 12288MiB | 3% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+

But when I try to train my model, it only uses CPU/RAM, not my GPU/VRAM (from either GPU).

Am I just fucking something up? Is there actually a method to select which GPU to use that I'm just missing? I've tried updating just about everything, and nothing seems to help.

34j commented 1 year ago

Need more info (could you paste the output?

PlateGlassArmour commented 1 year ago

Sure. Here's the output when I try to train. It seems to work just fine, other than not detecting my GPU(s)

Try the new cross-platform PowerShell https://aka.ms/pscore6

PS C:\Users\Usr> cd C:\Users\Usr\Documents\GitHub\so-vits-svc-fork PS C:\Users\Usr\Documents\GitHub\so-vits-svc-fork> svc train [23:01:53] WARNING [23:01:53] C:\Users\Usr\AppData\Local\Programs\Python\Python310\lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: [WinError 127] The specified procedure could not be found warnings.py:109 warn(f"Failed to load image Python extension: {e}")

[23:02:02] INFO [23:02:02] Using strategy: auto train.py:88 INFO: GPU available: False, used: False INFO [23:02:02] GPU available: False, used: False rank_zero.py:48 INFO: TPU available: False, using: 0 TPU cores INFO [23:02:02] TPU available: False, using: 0 TPU cores rank_zero.py:48 INFO: IPU available: False, using: 0 IPUs INFO [23:02:02] IPU available: False, using: 0 IPUs rank_zero.py:48 INFO: HPU available: False, using: 0 HPUs INFO [23:02:02] HPU available: False, using: 0 HPUs rank_zero.py:48 WARNING [23:02:02] C:\Users\Usr\AppData\Local\Programs\Python\Python310\lib\site-packages\so_vits_svc_fork\modules\synthesizers.py:81: UserWarning: Unused arguments: {'n_layers_q': 3, 'use_spectral_norm': False} warnings.py:109 warnings.warn(f"Unused arguments: {kwargs}")

       INFO     [23:02:02] Decoder type: hifi-gan                                                                                                                                                                                                      synthesizers.py:100

[23:02:03] WARNING [23:02:03] C:\Users\Usr\AppData\Local\Programs\Python\Python310\lib\site-packages\so_vits_svc_fork\utils.py:200: UserWarning: Keys not found in checkpoint state dict:['emb_g.weight'] warnings.py:109 warnings.warn(f"Keys not found in checkpoint state dict:" f"{not_in_from}")

       INFO     [23:02:03] Loaded checkpoint 'logs\44k\G_0.pth' (epoch 0)                                                                                                                                                                                     utils.py:261
       INFO     [23:02:03] Loaded checkpoint 'logs\44k\D_0.pth' (epoch 0)                                                                                                                                                                                     utils.py:261

┌───┬───────┬──────────────────────────┬────────┐ │ │ Name │ Type │ Params │ ├───┼───────┼──────────────────────────┼────────┤ │ 0 │ net_g │ SynthesizerTrn │ 45.2 M │ │ 1 │ net_d │ MultiPeriodDiscriminator │ 46.7 M │ └───┴───────┴──────────────────────────┴────────┘ Trainable params: 91.9 M Non-trainable params: 0 Total params: 91.9 M Total estimated model params size (MB): 367 [23:02:04] WARNING [23:02:04] C:\Users\Usr\AppData\Local\Programs\Python\Python310\lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:430: PossibleUserWarning: The dataloader, val_dataloader, does not have many workers which may warnings.py:109 be a bottleneck. Consider increasing the value of the num_workers argument(try 12 which is the number of cpus on this machine) in theDataLoader` init to improve performance. rank_zero_warn(

       WARNING  [23:02:04] C:\Users\Usr\AppData\Local\Programs\Python\Python310\lib\site-packages\lightning\pytorch\loops\fit_loop.py:280: PossibleUserWarning: The number of training batches (15) is smaller than the logging interval                 warnings.py:109
                Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
                  rank_zero_warn(

       INFO     [23:02:04] Setting current epoch to 0                                                                                                                                                                                                         train.py:300
       INFO     [23:02:04] Setting total batch idx to 0                                                                                                                                                                                                       train.py:316
       INFO     [23:02:04] Setting global step to 0                                                                                                                                                                                                           train.py:306

Epoch 0/9999 ---------------------------------------- 0/15 0:00:00 • -:--:-- 0.00it/s v_num: 0 [23:02:07] WARNING [23:02:07] C:\Users\Usr\AppData\Local\Programs\Python\Python310\lib\site-packages\torch_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This warnings.py:109 should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.get(instance, owner)()

[23:02:07] WARNING [23:02:07] C:\Users\Usr\AppData\Local\Programs\Python\Python310\lib\site-packages\torch_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This warnings.py:109 should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.get(instance, owner)()

[23:03:00] WARNING [23:03:00] C:\Users\Usr\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for warnings.py:109 all inputs, and return_complex=False will raise an error. Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ..\aten\src\ATen\native\SpectralOps.cpp:867.) return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]

Epoch 0/9999 ---------------------------------------- 0/15 0:00:00 • -:--:-- 0.00it/s v_num: 0`

34j commented 1 year ago

Duplicate of #372, #556, etc

voicepaw / so-vits-svc-fork

No way to select which GPU to run on. Runs on none of them. #609