keonlee9420 / PortaSpeech

PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech
MIT License
329 stars 36 forks source link

small(320000.pth.tar) weights incompatibility #29

Open ironmann250 opened 1 year ago

ironmann250 commented 1 year ago

`2022-11-11 22:31:08.004017: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll

Device of PortaSpeech: cpu Traceback (most recent call last): File "synthesize.py", line 153, in model = get_model(args, configs, device, train=False) File "D:\projects\PortaSpeech\utils\model.py", line 21, in get_model model.load_state_dict(ckpt["model"]) File "C:\ProgramData\Miniconda3\envs\tts_env\lib\site-packages\torch\nn\modules\module.py", line 1223, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for PortaSpeech: Missing key(s) in state_dict: "linguistic_encoder.phoneme_encoder.attn_layers.3.emb_rel_k", "linguistic_encoder.phoneme_encoder.attn_layers.3.emb_rel_v", "linguistic_encoder.phoneme_encoder.attn_layers.3.conv_q.weight", "linguistic_encoder.phoneme_encoder.attn_layers.3.conv_q.bias", "linguistic_encoder.phoneme_encoder.attn_layers.3.conv_k.weight", "linguistic_encoder.phoneme_encoder.attn_layers.3.conv_k.bias", "linguistic_encoder.phoneme_encoder.attn_layers.3.conv_v.weight", "linguistic_encoder.phoneme_encoder.attn_layers.3.conv_v.bias", "linguistic_encoder.phoneme_encoder.attn_layers.3.conv_o.weight", "linguistic_encoder.phoneme_encoder.attn_layers.3.conv_o.bias", "linguistic_encoder.phoneme_encoder.norm_layers_1.3.gamma", "linguistic_encoder.phoneme_encoder.norm_layers_1.3.beta", "linguistic_encoder.phoneme_encoder.ffn_layers.3.conv.weight", "linguistic_encoder.phoneme_encoder.ffn_layers.3.conv.bias", "linguistic_encoder.phoneme_encoder.norm_layers_2.3.gamma", "linguistic_encoder.phoneme_encoder.norm_layers_2.3.beta", "linguistic_encoder.word_encoder.attn_layers.3.emb_rel_k", "linguistic_encoder.word_encoder.attn_layers.3.emb_rel_v", "linguistic_encoder.word_encoder.attn_layers.3.conv_q.weight", "linguistic_encoder.word_encoder.attn_layers.3.conv_q.bias", "linguistic_encoder.word_encoder.attn_layers.3.conv_k.weight", "linguistic_encoder.word_encoder.attn_layers.3.conv_k.bias", "linguistic_encoder.word_encoder.attn_layers.3.conv_v.weight", "linguistic_encoder.word_encoder.attn_layers.3.conv_v.bias", "linguistic_encoder.word_encoder.attn_layers.3.conv_o.weight", "linguistic_encoder.word_encoder.attn_layers.3.conv_o.bias", "linguistic_encoder.word_encoder.norm_layers_1.3.gamma", "linguistic_encoder.word_encoder.norm_layers_1.3.beta", "linguistic_encoder.word_encoder.ffn_layers.3.conv.weight", "linguistic_encoder.word_encoder.ffn_layers.3.conv.bias", "linguistic_encoder.word_encoder.norm_layers_2.3.gamma", "linguistic_encoder.word_encoder.norm_layers_2.3.beta", "variational_generator.flow.flows.0.enc.in_layers.3.bias", "variational_generator.flow.flows.0.enc.in_layers.3.weight_g", "variational_generator.flow.flows.0.enc.in_layers.3.weight_v", "variational_generator.flow.flows.0.enc.res_skip_layers.3.bias", "variational_generator.flow.flows.0.enc.res_skip_layers.3.weight_g", "variational_generator.flow.flows.0.enc.res_skip_layers.3.weight_v", "variational_generator.flow.flows.2.enc.in_layers.3.bias", "variational_generator.flow.flows.2.enc.in_layers.3.weight_g", "variational_generator.flow.flows.2.enc.in_layers.3.weight_v", "variational_generator.flow.flows.2.enc.res_skip_layers.3.bias", "variational_generator.flow.flows.2.enc.res_skip_layers.3.weight_g", "variational_generator.flow.flows.2.enc.res_skip_layers.3.weight_v", "variational_generator.flow.flows.4.enc.in_layers.3.bias", "variational_generator.flow.flows.4.enc.in_layers.3.weight_g", "variational_generator.flow.flows.4.enc.in_layers.3.weight_v", "variational_generator.flow.flows.4.enc.res_skip_layers.3.bias", "variational_generator.flow.flows.4.enc.res_skip_layers.3.weight_g", "variational_generator.flow.flows.4.enc.res_skip_layers.3.weight_v", "variational_generator.flow.flows.6.enc.in_layers.3.bias", "variational_generator.flow.flows.6.enc.in_layers.3.weight_g", "variational_generator.flow.flows.6.enc.in_layers.3.weight_v", "variational_generator.flow.flows.6.enc.res_skip_layers.3.bias", "variational_generator.flow.flows.6.enc.res_skip_layers.3.weight_g", "variational_generator.flow.flows.6.enc.res_skip_layers.3.weight_v", "variational_generator.dec_wn.in_layers.3.bias", "variational_generator.dec_wn.in_layers.3.weight_g", "variational_generator.dec_wn.in_layers.3.weight_v", "variational_generator.dec_wn.res_skip_layers.3.bias", "variational_generator.dec_wn.res_skip_layers.3.weight_g", "variational_generator.dec_wn.res_skip_layers.3.weight_v", "postnet.flows.24.logs", "postnet.flows.24.bias", "postnet.flows.25.weight", "postnet.flows.26.start.bias", "postnet.flows.26.start.weight_g", "postnet.flows.26.start.weight_v", "postnet.flows.26.end.weight", "postnet.flows.26.end.bias", "postnet.flows.26.cond_layer.bias", "postnet.flows.26.cond_layer.weight_g", "postnet.flows.26.cond_layer.weight_v", "postnet.flows.26.wn.in_layers.0.bias", "postnet.flows.26.wn.in_layers.0.weight_g", "postnet.flows.26.wn.in_layers.0.weight_v", "postnet.flows.26.wn.in_layers.1.bias", "postnet.flows.26.wn.in_layers.1.weight_g", "postnet.flows.26.wn.in_layers.1.weight_v", "postnet.flows.26.wn.in_layers.2.bias", "postnet.flows.26.wn.in_layers.2.weight_g", "postnet.flows.26.wn.in_layers.2.weight_v", "postnet.flows.26.wn.res_skip_layers.0.bias", "postnet.flows.26.wn.res_skip_layers.0.weight_g", "postnet.flows.26.wn.res_skip_layers.0.weight_v", "postnet.flows.26.wn.res_skip_layers.1.bias", "postnet.flows.26.wn.res_skip_layers.1.weight_g", "postnet.flows.26.wn.res_skip_layers.1.weight_v", "postnet.flows.26.wn.res_skip_layers.2.bias", "postnet.flows.26.wn.res_skip_layers.2.weight_g", "postnet.flows.26.wn.res_skip_layers.2.weight_v", "postnet.flows.27.logs", "postnet.flows.27.bias", "postnet.flows.28.weight", "postnet.flows.29.start.bias", "postnet.flows.29.start.weight_g", "postnet.flows.29.start.weight_v", "postnet.flows.29.end.weight", "postnet.flows.29.end.bias", "postnet.flows.29.cond_layer.bias", "postnet.flows.29.cond_layer.weight_g", "postnet.flows.29.cond_layer.weight_v", "postnet.flows.29.wn.in_layers.0.bias", "postnet.flows.29.wn.in_layers.0.weight_g", "postnet.flows.29.wn.in_layers.0.weight_v", "postnet.flows.29.wn.in_layers.1.bias", "postnet.flows.29.wn.in_layers.1.weight_g", "postnet.flows.29.wn.in_layers.1.weight_v", "postnet.flows.29.wn.in_layers.2.bias", "postnet.flows.29.wn.in_layers.2.weight_g", "postnet.flows.29.wn.in_layers.2.weight_v", "postnet.flows.29.wn.res_skip_layers.0.bias", "postnet.flows.29.wn.res_skip_layers.0.weight_g", "postnet.flows.29.wn.res_skip_layers.0.weight_v", "postnet.flows.29.wn.res_skip_layers.1.bias", "postnet.flows.29.wn.res_skip_layers.1.weight_g", "postnet.flows.29.wn.res_skip_layers.1.weight_v", "postnet.flows.29.wn.res_skip_layers.2.bias", "postnet.flows.29.wn.res_skip_layers.2.weight_g", "postnet.flows.29.wn.res_skip_layers.2.weight_v", "postnet.flows.30.logs", "postnet.flows.30.bias", "postnet.flows.31.weight", "postnet.flows.32.start.bias", "postnet.flows.32.start.weight_g", "postnet.flows.32.start.weight_v", "postnet.flows.32.end.weight", "postnet.flows.32.end.bias", "postnet.flows.32.cond_layer.bias", "postnet.flows.32.cond_layer.weight_g", "postnet.flows.32.cond_layer.weight_v", "postnet.flows.32.wn.in_layers.0.bias", "postnet.flows.32.wn.in_layers.0.weight_g", "postnet.flows.32.wn.in_layers.0.weight_v", "postnet.flows.32.wn.in_layers.1.bias", "postnet.flows.32.wn.in_layers.1.weight_g", "postnet.flows.32.wn.in_layers.1.weight_v", "postnet.flows.32.wn.in_layers.2.bias", "postnet.flows.32.wn.in_layers.2.weight_g", "postnet.flows.32.wn.in_layers.2.weight_v", "postnet.flows.32.wn.res_skip_layers.0.bias", "postnet.flows.32.wn.res_skip_layers.0.weight_g", "postnet.flows.32.wn.res_skip_layers.0.weight_v", "postnet.flows.32.wn.res_skip_layers.1.bias", "postnet.flows.32.wn.res_skip_layers.1.weight_g", "postnet.flows.32.wn.res_skip_layers.1.weight_v", "postnet.flows.32.wn.res_skip_layers.2.bias", "postnet.flows.32.wn.res_skip_layers.2.weight_g", "postnet.flows.32.wn.res_skip_layers.2.weight_v", "postnet.flows.33.logs", "postnet.flows.33.bias", "postnet.flows.34.weight", "postnet.flows.35.start.bias", "postnet.flows.35.start.weight_g", "postnet.flows.35.start.weight_v", "postnet.flows.35.end.weight", "postnet.flows.35.end.bias", "postnet.flows.35.cond_layer.bias", "postnet.flows.35.cond_layer.weight_g", "postnet.flows.35.cond_layer.weight_v", "postnet.flows.35.wn.in_layers.0.bias", "postnet.flows.35.wn.in_layers.0.weight_g", "postnet.flows.35.wn.in_layers.0.weight_v", "postnet.flows.35.wn.in_layers.1.bias", "postnet.flows.35.wn.in_layers.1.weight_g", "postnet.flows.35.wn.in_layers.1.weight_v", "postnet.flows.35.wn.in_layers.2.bias", "postnet.flows.35.wn.in_layers.2.weight_g", "postnet.flows.35.wn.in_layers.2.weight_v", "postnet.flows.35.wn.res_skip_layers.0.bias", "postnet.flows.35.wn.res_skip_layers.0.weight_g", "postnet.flows.35.wn.res_skip_layers.0.weight_v", "postnet.flows.35.wn.res_skip_layers.1.bias", "postnet.flows.35.wn.res_skip_layers.1.weight_g", "postnet.flows.35.wn.res_skip_layers.1.weight_v", "postnet.flows.35.wn.res_skip_layers.2.bias", "postnet.flows.35.wn.res_skip_layers.2.weight_g", "postnet.flows.35.wn.res_skip_layers.2.weight_v". size mismatch for linguistic_encoder.abs_position_enc: copying a param with shape torch.Size([1, 1001, 128]) from checkpoint, the shape in current model is torch.Size([1, 1001, 192]). size mismatch for linguistic_encoder.kv_position_enc: copying a param with shape torch.Size([1, 1001, 128]) from checkpoint, the shape in current model is torch.Size([1, 1001, 192]). size mismatch for linguistic_encoder.q_position_enc: copying a param with shape torch.Size([1, 1001, 128]) from checkpoint, the shape in current model is torch.Size([1, 1001, 192]). size mismatch for linguistic_encoder.src_emb.weight: copying a param with shape torch.Size([361, 128]) from checkpoint, the shape in current model is torch.Size([361, 192]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.0.emb_rel_k: copying a param with shape torch.Size([1, 9, 64]) from checkpoint, the shape in current model is torch.Size([1, 9, 96]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.0.emb_rel_v: copying a param with shape torch.Size([1, 9, 64]) from checkpoint, the shape in current model is torch.Size([1, 9, 96]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.0.conv_q.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.0.conv_q.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.0.conv_k.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.0.conv_k.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.0.conv_v.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.0.conv_v.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.0.conv_o.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.0.conv_o.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.1.emb_rel_k: copying a param with shape torch.Size([1, 9, 64]) from checkpoint, the shape in current model is torch.Size([1, 9, 96]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.1.emb_rel_v: copying a param with shape torch.Size([1, 9, 64]) from checkpoint, the shape in current model is torch.Size([1, 9, 96]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.1.conv_q.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.1.conv_q.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.1.conv_k.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.1.conv_k.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.1.conv_v.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.1.conv_v.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.1.conv_o.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.1.conv_o.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.2.emb_rel_k: copying a param with shape torch.Size([1, 9, 64]) from checkpoint, the shape in current model is torch.Size([1, 9, 96]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.2.emb_rel_v: copying a param with shape torch.Size([1, 9, 64]) from checkpoint, the shape in current model is torch.Size([1, 9, 96]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.2.conv_q.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.2.conv_q.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.2.conv_k.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.2.conv_k.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.2.conv_v.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.2.conv_v.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.2.conv_o.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.2.conv_o.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.norm_layers_1.0.gamma: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.norm_layers_1.0.beta: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.norm_layers_1.1.gamma: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.norm_layers_1.1.beta: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.norm_layers_1.2.gamma: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.norm_layers_1.2.beta: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.ffn_layers.0.conv.weight: copying a param with shape torch.Size([128, 128, 3]) from checkpoint, the shape in current model is torch.Size([192, 192, 5]). size mismatch for linguistic_encoder.phoneme_encoder.ffn_layers.0.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.ffn_layers.1.conv.weight: copying a param with shape torch.Size([128, 128, 3]) from checkpoint, the shape in current model is torch.Size([192, 192, 5]). size mismatch for linguistic_encoder.phoneme_encoder.ffn_layers.1.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.ffn_layers.2.conv.weight: copying a param with shape torch.Size([128, 128, 3]) from checkpoint, the shape in current model is torch.Size([192, 192, 5]). size mismatch for linguistic_encoder.phoneme_encoder.ffn_layers.2.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.norm_layers_2.0.gamma: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.norm_layers_2.0.beta: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.norm_layers_2.1.gamma: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.norm_layers_2.1.beta: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.norm_layers_2.2.gamma: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.norm_layers_2.2.beta: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.word_encoder.attn_layers.0.emb_rel_k: copying a param with shape torch.Size([1, 9, 64]) from checkpoint, the shape in current model is torch.Size([1, 9, 96]). size mismatch for linguistic_encoder.word_encoder.attn_layers.0.emb_rel_v: copying a param with shape torch.Size([1, 9, 64]) from checkpoint, the shape in current model is torch.Size([1, 9, 96]). size mismatch for linguistic_encoder.word_encoder.attn_layers.0.conv_q.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.word_encoder.attn_layers.0.conv_q.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.word_encoder.attn_layers.0.conv_k.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.word_encoder.attn_layers.0.conv_k.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.word_encoder.attn_layers.0.conv_v.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.word_encoder.attn_layers.0.conv_v.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.word_encoder.attn_layers.0.conv_o.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.word_encoder.attn_layers.0.conv_o.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.word_encoder.attn_layers.1.emb_rel_k: copying a param with shape torch.Size([1, 9, 64]) from checkpoint, the shape in current model is torch.Size([1, 9, 96]). size mismatch for linguistic_encoder.word_encoder.attn_layers.1.emb_rel_v: copying a param with shape torch.Size([1, 9, 64]) from checkpoint, the shape in current model is torch.Size([1, 9, 96]). size mismatch for linguistic_encoder.word_encoder.attn_layers.1.conv_q.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.word_encoder.attn_layers.1.conv_q.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.word_encoder.attn_layers.1.conv_k.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.word_encoder.attn_layers.1.conv_k.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.word_encoder.attn_layers.1.conv_v.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.word_encoder.attn_layers.1.conv_v.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.word_encoder.attn_layers.1.conv_o.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.word_encoder.attn_layers.1.conv_o.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.word_encoder.attn_layers.2.emb_rel_k: copying a param with shape torch.Size([1, 9, 64]) from...`

Frei2 commented 6 months ago

@ironmann250 Hello! Can I ask that whether you train this model (portaspeech) under windows system?

ironmann250 commented 1 month ago

Yes, windows 10