yl4579 / StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
MIT License
4.95k stars 417 forks source link

mismatch in the number of channels #137

Closed Devintundo closed 11 months ago

Devintundo commented 11 months ago

Trying to fine-tune the model for custom data. After I run the below command accelerate launch --mixed_precision=fp16 --num_processes=1 train_finetune_accelerate.py --config_path ./Configs/config_ft.yml.

Also, I could not install espeak-ng on Amazon Linux 2 Jupiter notebook. Any pointers will be appreciated.

File "/home/ec2-user/SageMaker/StyleTTS2/train_finetune_accelerate.py", line 714, in main() File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(args, kwargs) File "/home/ec2-user/SageMaker/StyleTTS2/train_finetune_accelerate.py", line 403, in main y_rec_gt_pred = model.decoder(en, F0_real, N_real, s) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 185, in forward outputs = self.parallel_apply(replicas, inputs, module_kwargs) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 200, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 110, in parallel_apply output.reraise() File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/_utils.py", line 694, in reraise raise exception RuntimeError: Caught RuntimeError in replica 0 on device 0. Original Traceback (most recent call last): File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in _worker output = module(*input, kwargs) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/home/ec2-user/SageMaker/StyleTTS2/Modules/hifigan.py", line 458, in forward F0 = self.F0_conv(F0_curve.unsqueeze(1)) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1568, in _call_impl result = forward_call(args, **kwargs) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 310, in forward return self._conv_forward(input, self.weight, self.bias) File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 306, in _conv_forward return F.conv1d(input, weight, bias, self.stride, RuntimeError: Given groups=1, weight of size [1, 1, 3], expected input[1, 358, 1] to have 1 channels, but got 358 channels instead

yl4579 commented 11 months ago

115