innnky / so-vits-svc

基于vits与softvc的歌声音色转换模型
GNU Affero General Public License v3.0
3.62k stars 6 forks source link

hubert model structure mismatch #121

Closed buptorange closed 1 year ago

buptorange commented 1 year ago

您好!首先感谢您出色的工作!

当我尝试用4.0预处理数据时,运行python preprocess_hubert_f0.py时 发现下载的contentvec的checkpoint_best_legacy_500.pt并不成load进hubertSoft模型中。 看到您更新了特征输入的模型为content vec,是否有其他load model的新方法?

以下为错误提示,我也尝试过只load model的ordereddict,发现还是不匹配。

Traceback (most recent call last): File "preprocess_hubert_f0.py", line 99, in hmodel = utils.get_hubert_model(0 if torch.cuda.is_available() else None) File "/root/桌面/so-vits-svc-4.0/utils.py", line 44, in get_hubert_model hubert_soft = hubert_model.hubert_soft("hubert/checkpoint_best_legacy_500.pt") File "/root/桌面/so-vits-svc-4.0/hubert/hubert_model.py", line 220, in hubert_soft hubert.load_state_dict(checkpoint) File "/root/anaconda3/envs/svc/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for HubertSoft: Missing key(s) in state_dict: "masked_spec_embed", "feature_extractor.conv0.weight", "feature_extractor.norm0.weight", "feature_extractor.norm0.bias", "feature_extractor.conv1.weight", "feature_extractor.conv2.weight", "feature_extractor.conv3.weight", "feature_extractor.conv4.weight", "feature_extractor.conv5.weight", "feature_extractor.conv6.weight", "feature_projection.norm.weight", "feature_projection.norm.bias", "feature_projection.projection.weight", "feature_projection.projection.bias", "positional_embedding.conv.bias", "positional_embedding.conv.weight_g", "positional_embedding.conv.weight_v", "norm.weight", "norm.bias", "encoder.layers.0.self_attn.in_proj_weight", "encoder.layers.0.self_attn.in_proj_bias", "encoder.layers.0.self_attn.out_proj.weight", "encoder.layers.0.self_attn.out_proj.bias", "encoder.layers.0.linear1.weight", "encoder.layers.0.linear1.bias", "encoder.layers.0.linear2.weight", "encoder.layers.0.linear2.bias", "encoder.layers.0.norm1.weight", "encoder.layers.0.norm1.bias", "encoder.layers.0.norm2.weight", "encoder.layers.0.norm2.bias", "encoder.layers.1.self_attn.in_proj_weight", "encoder.layers.1.self_attn.in_proj_bias", "encoder.layers.1.self_attn.out_proj.weight", "encoder.layers.1.self_attn.out_proj.bias", "encoder.layers.1.linear1.weight", "encoder.layers.1.linear1.bias", "encoder.layers.1.linear2.weight", "encoder.layers.1.linear2.bias", "encoder.layers.1.norm1.weight", "encoder.layers.1.norm1.bias", "encoder.layers.1.norm2.weight", "encoder.layers.1.norm2.bias", "encoder.layers.2.self_attn.in_proj_weight", "encoder.layers.2.self_attn.in_proj_bias", "encoder.layers.2.self_attn.out_proj.weight", "encoder.layers.2.self_attn.out_proj.bias", "encoder.layers.2.linear1.weight", "encoder.layers.2.linear1.bias", "encoder.layers.2.linear2.weight", "encoder.layers.2.linear2.bias", "encoder.layers.2.norm1.weight", "encoder.layers.2.norm1.bias", "encoder.layers.2.norm2.weight", "encoder.layers.2.norm2.bias", "encoder.layers.3.self_attn.in_proj_weight", "encoder.layers.3.self_attn.in_proj_bias", "encoder.layers.3.self_attn.out_proj.weight", "encoder.layers.3.self_attn.out_proj.bias", "encoder.layers.3.linear1.weight", "encoder.layers.3.linear1.bias", "encoder.layers.3.linear2.weight", "encoder.layers.3.linear2.bias", "encoder.layers.3.norm1.weight", "encoder.layers.3.norm1.bias", "encoder.layers.3.norm2.weight", "encoder.layers.3.norm2.bias", "encoder.layers.4.self_attn.in_proj_weight", "encoder.layers.4.self_attn.in_proj_bias", "encoder.layers.4.self_attn.out_proj.weight", "encoder.layers.4.self_attn.out_proj.bias", "encoder.layers.4.linear1.weight", "encoder.layers.4.linear1.bias", "encoder.layers.4.linear2.weight", "encoder.layers.4.linear2.bias", "encoder.layers.4.norm1.weight", "encoder.layers.4.norm1.bias", "encoder.layers.4.norm2.weight", "encoder.layers.4.norm2.bias", "encoder.layers.5.self_attn.in_proj_weight", "encoder.layers.5.self_attn.in_proj_bias", "encoder.layers.5.self_attn.out_proj.weight", "encoder.layers.5.self_attn.out_proj.bias", "encoder.layers.5.linear1.weight", "encoder.layers.5.linear1.bias", "encoder.layers.5.linear2.weight", "encoder.layers.5.linear2.bias", "encoder.layers.5.norm1.weight", "encoder.layers.5.norm1.bias", "encoder.layers.5.norm2.weight", "encoder.layers.5.norm2.bias", "encoder.layers.6.self_attn.in_proj_weight", "encoder.layers.6.self_attn.in_proj_bias", "encoder.layers.6.self_attn.out_proj.weight", "encoder.layers.6.self_attn.out_proj.bias", "encoder.layers.6.linear1.weight", "encoder.layers.6.linear1.bias", "encoder.layers.6.linear2.weight", "encoder.layers.6.linear2.bias", "encoder.layers.6.norm1.weight", "encoder.layers.6.norm1.bias", "encoder.layers.6.norm2.weight", "encoder.layers.6.norm2.bias", "encoder.layers.7.self_attn.in_proj_weight", "encoder.layers.7.self_attn.in_proj_bias", "encoder.layers.7.self_attn.out_proj.weight", "encoder.layers.7.self_attn.out_proj.bias", "encoder.layers.7.linear1.weight", "encoder.layers.7.linear1.bias", "encoder.layers.7.linear2.weight", "encoder.layers.7.linear2.bias", "encoder.layers.7.norm1.weight", "encoder.layers.7.norm1.bias", "encoder.layers.7.norm2.weight", "encoder.layers.7.norm2.bias", "encoder.layers.8.self_attn.in_proj_weight", "encoder.layers.8.self_attn.in_proj_bias", "encoder.layers.8.self_attn.out_proj.weight", "encoder.layers.8.self_attn.out_proj.bias", "encoder.layers.8.linear1.weight", "encoder.layers.8.linear1.bias", "encoder.layers.8.linear2.weight", "encoder.layers.8.linear2.bias", "encoder.layers.8.norm1.weight", "encoder.layers.8.norm1.bias", "encoder.layers.8.norm2.weight", "encoder.layers.8.norm2.bias", "encoder.layers.9.self_attn.in_proj_weight", "encoder.layers.9.self_attn.in_proj_bias", "encoder.layers.9.self_attn.out_proj.weight", "encoder.layers.9.self_attn.out_proj.bias", "encoder.layers.9.linear1.weight", "encoder.layers.9.linear1.bias", "encoder.layers.9.linear2.weight", "encoder.layers.9.linear2.bias", "encoder.layers.9.norm1.weight", "encoder.layers.9.norm1.bias", "encoder.layers.9.norm2.weight", "encoder.layers.9.norm2.bias", "encoder.layers.10.self_attn.in_proj_weight", "encoder.layers.10.self_attn.in_proj_bias", "encoder.layers.10.self_attn.out_proj.weight", "encoder.layers.10.self_attn.out_proj.bias", "encoder.layers.10.linear1.weight", "encoder.layers.10.linear1.bias", "encoder.layers.10.linear2.weight", "encoder.layers.10.linear2.bias", "encoder.layers.10.norm1.weight", "encoder.layers.10.norm1.bias", "encoder.layers.10.norm2.weight", "encoder.layers.10.norm2.bias", "encoder.layers.11.self_attn.in_proj_weight", "encoder.layers.11.self_attn.in_proj_bias", "encoder.layers.11.self_attn.out_proj.weight", "encoder.layers.11.self_attn.out_proj.bias", "encoder.layers.11.linear1.weight", "encoder.layers.11.linear1.bias", "encoder.layers.11.linear2.weight", "encoder.layers.11.linear2.bias", "encoder.layers.11.norm1.weight", "encoder.layers.11.norm1.bias", "encoder.layers.11.norm2.weight", "encoder.layers.11.norm2.bias", "proj.weight", "proj.bias", "label_embedding.weight". Unexpected key(s) in state_dict: "args", "cfg", "model", "criterion", "optimizer_history", "task_state", "extra_state", "last_optimizer_state".

innnky commented 1 year ago

我不清楚你是用的啥版本,但是4.0的这个方法早就已经换成成contentvec的加载了 https://github.com/innnky/so-vits-svc/blob/ac86f0dc0734cd71afbbc0056d4c651277d19444/utils.py#L182-L192

innnky commented 1 year ago

我看你这代码对应的行数,File "preprocess_hubert_f0.py", line 99, in hmodel = utils.get_hubert_model(0 if torch.cuda.is_available() else None) 你这明显是3.0的代码

buptorange commented 1 year ago

非常感谢,忘记切branch了。。 可以的话麻烦删除这个蠢问题😓