ylacombe / finetune-hf-vits

Finetune VITS and MMS using HuggingFace's tools
MIT License
115 stars 25 forks source link

Stuck at : Steps: 0%| | 0/20 [00:00<?, ?it/s] #37

Closed muhammadsaadgondal closed 1 month ago

muhammadsaadgondal commented 2 months ago

My model is stuck at 0% I can't understand why.

[INFO|configuration_utils.py:472] 2024-08-07 04:19:10,786 >> Configuration saved in C:\Users\MUHAMM~1\AppData\Local\Temp\tmp68z04aah\config.json [INFO|modeling_utils.py:2765] 2024-08-07 04:19:11,656 >> Model weights saved in C:\Users\MUHAMM~1\AppData\Local\Temp\tmp68z04aah\model.safetensors [INFO|configuration_utils.py:731] 2024-08-07 04:19:11,668 >> loading configuration file C:\Users\MUHAMM~1\AppData\Local\Temp\tmp68z04aah\config.json [INFO|configuration_utils.py:800] 2024-08-07 04:19:11,680 >> Model config VitsConfig { "_name_or_path": "ylacombe/mms-tts-guj-train", "activation_dropout": 0.1, "architectures": [ "VitsDiscriminator" ], "attention_dropout": 0.1, "depth_separable_channels": 2, "depth_separable_num_layers": 3, "discriminator_kernel_size": 5, "discriminator_period_channels": [ 1, 32, 128, 512, 1024 ], "discriminator_periods": [ 2, 3, 5, 7, 11 ], "discriminator_scale_channels": [ 1, 16, 64, 256, 1024 ], "discriminator_stride": 3, "duration_predictor_dropout": 0.5, "duration_predictor_filter_channels": 256, "duration_predictor_flow_bins": 10, "duration_predictor_kernel_size": 3, "duration_predictor_num_flows": 4, "duration_predictor_tail_bound": 5.0, "ffn_dim": 768, "ffn_kernel_size": 3, "flow_size": 192, "hidden_act": "relu", "hidden_dropout": 0.1, "hidden_size": 192, "hop_length": 256, "initializer_range": 0.02, "layer_norm_eps": 1e-05, "layerdrop": 0.1, "leaky_relu_slope": 0.1, "model_type": "vits", "noise_scale": 0.667, "noise_scale_duration": 0.8, "num_attention_heads": 2, "num_hidden_layers": 6, "num_speakers": 1, "posterior_encoder_num_wavenet_layers": 16, "prior_encoder_num_flows": 4, "prior_encoder_num_wavenet_layers": 4, "resblock_dilation_sizes": [ [ 1, 3, 5 ], [ 1, 3, 5 ], [ 1, 3, 5 ] ], "resblock_kernel_sizes": [ 3, 7, 11 ], "sampling_rate": 16000, "segment_size": 8192, "speaker_embedding_size": 0, "speaking_rate": 1.0, "spectrogram_bins": 513, "torch_dtype": "float32", "transformers_version": "4.43.4", "upsample_initial_channel": 512, "upsample_kernel_sizes": [ 16, 16, 4, 4 ], "upsample_rates": [ 8, 8, 2, 2 ], "use_bias": true, "use_stochastic_duration_prediction": true, "vocab_size": 60, "wavenet_dilation_rate": 1, "wavenet_dropout": 0.0, "wavenet_kernel_size": 5, "window_size": 4 }

[INFO|modeling_utils.py:3641] 2024-08-07 04:19:11,728 >> loading weights file C:\Users\MUHAMM~1\AppData\Local\Temp\tmp68z04aah\model.safetensors [INFO|modeling_utils.py:4473] 2024-08-07 04:19:11,842 >> All model checkpoint weights were used when initializing VitsDiscriminator.

[INFO|modeling_utils.py:4481] 2024-08-07 04:19:11,844 >> All the weights of VitsDiscriminator were initialized from the model checkpoint at C:\Users\MUHAMM~1\AppData\Local\Temp\tmp68z04aah. If your task is similar to the task the model of the checkpoint was trained on, you can already use VitsDiscriminator for predictions without further training. wandb: Currently logged in as: saadgondal203 (saadgondal203-comsats-university-islamabad). Use wandb login --relogin to force relogin wandb: Tracking run with wandb version 0.17.5 wandb: Run data is saved locally in M:\VoiceCloning\finetune-hf-vits\wandb\run-20240807_041919-bd4vl385 wandb: Run wandb offline to turn off syncing. wandb: Syncing run still-deluge-3 wandb: View project at https://wandb.ai/saadgondal203-comsats-university-islamabad/mms_gujarati_finetuning wandb: View run at https://wandb.ai/saadgondal203-comsats-university-islamabad/mms_gujarati_finetuning/runs/bd4vl385
08/07/2024 04:19:21 - INFO - main - Running training 08/07/2024 04:19:21 - INFO - main - Num examples = 110 08/07/2024 04:19:21 - INFO - main - Num Epochs = 200 08/07/2024 04:19:21 - INFO - main - Instantaneous batch size per device = 16 08/07/2024 04:19:21 - INFO - main - Total train batch size (w. parallel, distributed & accumulation) = 16
08/07/2024 04:19:21 - INFO - main - Gradient Accumulation steps = 1 08/07/2024 04:19:21 - INFO - main - Total optimization steps = 1400 Steps: 0%| | 0/1400 [00:00<?, ?it/s]C:\Users\Muhammad Saad\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\torch\functional.py:666: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\SpectralOps.cpp:878.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined] Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "C:\Users\Muhammad Saad\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\Scripts\accelerate.exe__main__.py", line 7, in File "C:\Users\Muhammad Saad\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "C:\Users\Muhammad Saad\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\accelerate\commands\launch.py", line 1106, in launch_command simple_launcher(args) File "C:\Users\Muhammad Saad\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\Users\Muhammad Saad\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe', 'run_vits_finetuning.py', './training_config_examples/finetune_mms.json']' returned non-zero exit status 3221225477.

muhammadsaadgondal commented 1 month ago

Can low disk space be the problem? How much of space do we require to run the finetuning