voicepaw / so-vits-svc-fork

so-vits-svc fork with realtime support, improved interface and more features.
Other
8.69k stars 1.15k forks source link

Error during realtime inference #293

Closed kin0303 closed 1 year ago

kin0303 commented 1 year ago

Describe the bug I got error during inference, Here's the log

2023-04-11 08:48:23.159152: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-11 08:48:24.231392: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[08:48:25] NumExpr defaulting to 2 threads.
[08:48:26] Version: 3.1.9
[08:48:26] auto_predict_f0 = True in realtime inference will cause unstable voice pitch, use with caution
Downloading checkpoint_best_legacy_500.pt: 100% 1.24G/1.24G [00:04<00:00, 274MiB/s]
[08:48:33] current directory is /content
[08:48:33] HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False}
[08:48:33] HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False}
[08:48:45] /usr/local/lib/python3.9/dist-packages/so_vits_svc_fork/modules/synthesizers.py:81: UserWarning: Unused arguments: {'n_layers_q': 3, 'use_spectral_norm': False}
  warnings.warn(f"Unused arguments: {kwargs}")

[08:48:45] Decoder type: ms-istft
[08:48:59] Loaded checkpoint '/content/drive/MyDrive/Voice_Changer/drive/MyDrive/so-vits-svc-fork/logs/44k/G_5800.pth' (iteration 149)
[08:48:59] Creating realtime model...
[08:48:59] Device: 
Traceback (most recent call last):
  File "/usr/local/bin/svc", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/so_vits_svc_fork/__main__.py", line 407, in vc
    realtime(
  File "/usr/local/lib/python3.9/dist-packages/so_vits_svc_fork/inference/main.py", line 158, in realtime
    f"Input Device: {devices[input_device]['name']}, Output Device: {devices[output_device]['name']}"
IndexError: tuple index out of range

To Reproduce Steps to reproduce the behavior:

!svc vc -s "Athy" -m /content/drive/MyDrive/Voice_Changer/drive/MyDrive/so-vits-svc-fork/logs/44k/G_5800.pth -c /content/drive/MyDrive/Voice_Changer/drive/MyDrive/so-vits-svc-fork/logs/44k/config.json 

Additional context The inference is via colab, there are no problems during training but an error during inference

Lordmau5 commented 1 year ago

From what I know Realtime inference is impossible on Colab since it doesn't have any audio devices (and you can't have it use your local ones)

kin0303 commented 1 year ago

From what I know Realtime inference is impossible on Colab since it doesn't have any audio devices (and you can't have it use your local ones)

Can it only be done on a local computer?

Lordmau5 commented 1 year ago

Can it only be done on a local computer?

At this point in time, unfortunately yes.

I am not aware of any way to connect / link your local Audio devices with Colab, but maybe there will be something like that in the future

kin0303 commented 1 year ago

Can it only be done on a local computer?

At this point in time, unfortunately yes.

I am not aware of any way to connect / link your local Audio devices with Colab, but maybe there will be something like that in the future

Okay, thanks a lot for your reply