keonlee9420 / Cross-Speaker-Emotion-Transfer

PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech
MIT License
181 stars 26 forks source link

Error using the pretrained model #9

Open jrings opened 2 years ago

jrings commented 2 years ago

I'm trying to run synthesize with the pretrained model, like such:

python3 synthesize.py --text "This sentence is a test" --speaker_id Actor_01 --emotion_id neutral --restore_step 450000  --dataset RAVDESS --mode single

but I get an error in layer size:

Traceback (most recent call last):
  File "synthesize.py", line 206, in <module>
    model = get_model(args, configs, device, train=False,
  File "/home/jrings/diviai/installs/Cross-Speaker-Emotion-Transfer/utils/model.py", line 27, in get_model
    model.load_state_dict(model_dict, strict=False)
  File "<...>/torch/nn/modules/module.py", line 1604, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for XSpkEmoTrans:
    size mismatch for emotion_emb.etl.embed: copying a param with shape torch.Size([8, 64]) from checkpoint, the shape in current model is torch.Size([9, 64]).
    size mismatch for duratin_predictor.lconv_stack.0.conv_layer.weight: copying a param with shape torch.Size([2, 1, 3]) from checkpoint, the shape in current model is torch.Size([2, 3]).
    size mismatch for decoder.lconv_stack.0.conv_layer.weight: copying a param with shape torch.Size([8, 1, 15]) from checkpoint, the shape in current model is torch.Size([8, 15]).
    size mismatch for decoder.lconv_stack.1.conv_layer.weight: copying a param with shape torch.Size([8, 1, 15]) from checkpoint, the shape in current model is torch.Size([8, 15]).
    size mismatch for decoder.lconv_stack.2.conv_layer.weight: copying a param with shape torch.Size([8, 1, 15]) from checkpoint, the shape in current model is torch.Size([8, 15]).
    size mismatch for decoder.lconv_stack.3.conv_layer.weight: copying a param with shape torch.Size([8, 1, 15]) from checkpoint, the shape in current model is torch.Size([8, 15]).
    size mismatch for decoder.lconv_stack.4.conv_layer.weight: copying a param with shape torch.Size([8, 1, 15]) from checkpoint, the shape in current model is torch.Size([8, 15]).
    size mismatch for decoder.lconv_stack.5.conv_layer.weight: copying a param with shape torch.Size([8, 1, 15]) from checkpoint, the shape in current model is torch.Size([8, 15]).
hathubkhn commented 2 years ago

Hello jrings, Did you make speaker_embedding_vector successfully? and copy pretrained-model into correct location like guideline

LifeOfCodeDesigner commented 1 year ago

I meet the same problem together, have you solved it ?