qinzhuowu / NumS2T

ACL21 Math Word Problem Solving with Explicit Numerical Values
11 stars 3 forks source link

预训练模型 #3

Closed wlkdb closed 2 years ago

wlkdb commented 3 years ago

您好!请问现在models文件夹下的模型是基于math23k训练的么,方便把基于ape210k的预训练模型也公开下吗,谢谢~

qinzhuowu commented 3 years ago

上传至models_APE_char_0401了

-----原始邮件----- 发件人:lg @.> 发送时间:2021-11-03 15:25:59 (星期三) 收件人: qinzhuowu/NumS2T @.> 抄送: Subscribed @.***> 主题: [qinzhuowu/NumS2T] 预训练模型 (Issue #3)

您好!请问现在models文件夹下的模型是基于math23k训练的么,方便把基于ape210k的预训练模型也公开下吗,谢谢~

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

wlkdb commented 2 years ago

您好,在用run_seq2tree_APE调models_APE_char_0401报了如下参数不匹配的错误,是不是某些配置参数需要修改下呢,或者您方便把相关的代码再更新下吗~

RuntimeError: Error(s) in loading state_dict for EncoderSeq: Missing key(s) in state_dict: "category_embedding.weight", "gat_1.fc.weight", "gat_1.attn_fc.weight", "gat_dense.weight", "gat_dense.bias", "gcn.graph.0.gc1.weight", "gcn.graph.0.gc1.bias", "gcn.graph.0.gc2.weight", "gcn.graph.0.gc2.bias", "gcn.graph.1.gc1.weight", "gcn.graph.1.gc1.bias", "gcn.graph.1.gc2.weight", "gcn.graph.1.gc2.bias", "gcn.graph.2.gc1.weight", "gcn.graph.2.gc1.bias", "gcn.graph.2.gc2.weight", "gcn.graph.2.gc2.bias", "gcn.graph.3.gc1.weight", "gcn.graph.3.gc1.bias", "gcn.graph.3.gc2.weight", "gcn.graph.3.gc2.bias", "gcn.feed_foward.w_1.weight", "gcn.feed_foward.w_1.bias", "gcn.feed_foward.w_2.weight", "gcn.feed_foward.w_2.bias", "gcn.norm.a_2", "gcn.norm.b_2", "attention_0.a1", "attention_0.a2", "attention_0.W.weight", "attention_1.a1", "attention_1.a2", "attention_1.W.weight", "attention_2.a1", "attention_2.a2", "attention_2.W.weight", "attention_3.a1", "attention_3.a2", "attention_3.W.weight", "attention_4.a1", "attention_4.a2", "attention_4.W.weight", "attention_5.a1", "attention_5.a2", "attention_5.W.weight", "attention_6.a1", "attention_6.a2", "attention_6.W.weight", "attention_7.a1", "attention_7.a2", "attention_7.W.weight", "pos_embedding.position_encoding.weight", "encoder_layers.attention.a", "encoder_layers.attention.b", "encoder_layers.attention.linear_k.weight", "encoder_layers.attention.linear_k.bias", "encoder_layers.attention.linear_v.weight", "encoder_layers.attention.linear_v.bias", "encoder_layers.attention.linear_q.weight", "encoder_layers.attention.linear_q.bias", "encoder_layers.attention.linear_x.weight", "encoder_layers.attention.linear_x.bias", "encoder_layers.feed_forward.w1.weight", "encoder_layers.feed_forward.w1.bias", "encoder_layers.feed_forward.w2.weight", "encoder_layers.feed_forward.w2.bias", "encoder_layers.norm.a_2", "encoder_layers.norm.b_2", "encoder_layers2.attention.a", "encoder_layers2.attention.b", "encoder_layers2.attention.linear_k.weight", "encoder_layers2.attention.linear_k.bias", "encoder_layers2.attention.linear_v.weight", "encoder_layers2.attention.linear_v.bias", "encoder_layers2.attention.linear_q.weight", "encoder_layers2.attention.linear_q.bias", "encoder_layers2.attention.linear_x.weight", "encoder_layers2.attention.linear_x.bias", "encoder_layers2.feed_forward.w1.weight", "encoder_layers2.feed_forward.w1.bias", "encoder_layers2.feed_forward.w2.weight", "encoder_layers2.feed_forward.w2.bias", "encoder_layers2.norm.a_2", "encoder_layers2.norm.b_2". size mismatch for embedding.weight: copying a param with shape torch.Size([3409, 300]) from checkpoint, the shape in current model is torch.Size([3243, 300]). size mismatch for gru_pade.weight_ih_l0: copying a param with shape torch.Size([1536, 300]) from checkpoint, the shape in current model is torch.Size([768, 300]). size mismatch for gru_pade.weight_hh_l0: copying a param with shape torch.Size([1536, 512]) from checkpoint, the shape in current model is torch.Size([768, 256]). size mismatch for gru_pade.bias_ih_l0: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for gru_pade.bias_hh_l0: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for gru_pade.weight_ih_l0_reverse: copying a param with shape torch.Size([1536, 300]) from checkpoint, the shape in current model is torch.Size([768, 300]). size mismatch for gru_pade.weight_hh_l0_reverse: copying a param with shape torch.Size([1536, 512]) from checkpoint, the shape in current model is torch.Size([768, 256]). size mismatch for gru_pade.bias_ih_l0_reverse: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for gru_pade.bias_hh_l0_reverse: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for gru_pade.weight_ih_l1: copying a param with shape torch.Size([1536, 1024]) from checkpoint, the shape in current model is torch.Size([768, 512]). size mismatch for gru_pade.weight_hh_l1: copying a param with shape torch.Size([1536, 512]) from checkpoint, the shape in current model is torch.Size([768, 256]). size mismatch for gru_pade.bias_ih_l1: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for gru_pade.bias_hh_l1: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for gru_pade.weight_ih_l1_reverse: copying a param with shape torch.Size([1536, 1024]) from checkpoint, the shape in current model is torch.Size([768, 512]). size mismatch for gru_pade.weight_hh_l1_reverse: copying a param with shape torch.Size([1536, 512]) from checkpoint, the shape in current model is torch.Size([768, 256]). size mismatch for gru_pade.bias_ih_l1_reverse: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for gru_pade.bias_hh_l1_reverse: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]).

qinzhuowu commented 2 years ago

应该是参数改了,因为我后来还在继续这个项目。 我测试了一个完整的项目放在NumS2T_APE_0521下了,把data和hownet加上跑run_seq2tree_test.py的结果应该和输出结果文件里面的一样。 如果报错或者和输出结果不一样那可能是环境问题。 如果文件损坏可能是这个github Desktop上传中损坏了,我可以通过外链发给你。 在跑完run_seq2tree_test.py没什么问题后,run_seq2tree.py就是包含训练部分的主程序了。 再遇到什么问题可以随时联系我。

-----原始邮件----- 发件人:lg @.> 发送时间:2021-11-09 19:16:44 (星期二) 收件人: qinzhuowu/NumS2T @.> 抄送: qinzhuowu @.>, Comment @.> 主题: Re: [qinzhuowu/NumS2T] 预训练模型 (Issue #3)

您好,在用run_seq2tree_APE调models_APE_char_0401报了如下参数不匹配的错误,是不是某些配置参数需要修改下呢,或者您方便把相关的代码再更新下吗~

RuntimeError: Error(s) in loading state_dict for EncoderSeq: Missing key(s) in state_dict: "category_embedding.weight", "gat_1.fc.weight", "gat_1.attn_fc.weight", "gat_dense.weight", "gat_dense.bias", "gcn.graph.0.gc1.weight", "gcn.graph.0.gc1.bias", "gcn.graph.0.gc2.weight", "gcn.graph.0.gc2.bias", "gcn.graph.1.gc1.weight", "gcn.graph.1.gc1.bias", "gcn.graph.1.gc2.weight", "gcn.graph.1.gc2.bias", "gcn.graph.2.gc1.weight", "gcn.graph.2.gc1.bias", "gcn.graph.2.gc2.weight", "gcn.graph.2.gc2.bias", "gcn.graph.3.gc1.weight", "gcn.graph.3.gc1.bias", "gcn.graph.3.gc2.weight", "gcn.graph.3.gc2.bias", "gcn.feed_foward.w_1.weight", "gcn.feed_foward.w_1.bias", "gcn.feed_foward.w_2.weight", "gcn.feed_foward.w_2.bias", "gcn.norm.a_2", "gcn.norm.b_2", "attention_0.a1", "attention_0.a2", "attention_0.W.weight", "attention_1.a1", "attention_1.a2", "attention_1.W.weight", "attention_2.a1", "attention_2.a2", "attention_2.W.weight", "attention_3.a1", "attention_3.a2", "attention_3.W.weight", "attention_4.a1", "attention_4.a2", "attention_4.W.weight", "attention_5.a1", "attention_5.a2", "attention_5.W.weight", "attention_6.a1", "attention_6.a2", "attention_6.W.weight", "attention_7.a1", "attention_7.a2", "attention_7.W.weight", "pos_embedding.position_encoding.weight", "encoder_layers.attention.a", "encoder_layers.attention.b", "encoder_layers.attention.linear_k.weight", "encoder_layers.attention.linear_k.bias", "encoder_layers.attention.linear_v.weight", "encoder_layers.attention.linear_v.bias", "encoder_layers.attention.linear_q.weight", "encoder_layers.attention.linear_q.bias", "encoder_layers.attention.linear_x.weight", "encoder_layers.attention.linear_x.bias", "encoder_layers.feed_forward.w1.weight", "encoder_layers.feed_forward.w1.bias", "encoder_layers.feed_forward.w2.weight", "encoder_layers.feed_forward.w2.bias", "encoder_layers.norm.a_2", "encoder_layers.norm.b_2", "encoder_layers2.attention.a", "encoder_layers2.attention.b", "encoder_layers2.attention.linear_k.weight", "encoder_layers2.attention.linear_k.bias", "encoder_layers2.attention.linear_v.weight", "encoder_layers2.attention.linear_v.bias", "encoder_layers2.attention.linear_q.weight", "encoder_layers2.attention.linear_q.bias", "encoder_layers2.attention.linear_x.weight", "encoder_layers2.attention.linear_x.bias", "encoder_layers2.feed_forward.w1.weight", "encoder_layers2.feed_forward.w1.bias", "encoder_layers2.feed_forward.w2.weight", "encoder_layers2.feed_forward.w2.bias", "encoder_layers2.norm.a_2", "encoder_layers2.norm.b_2". size mismatch for embedding.weight: copying a param with shape torch.Size([3409, 300]) from checkpoint, the shape in current model is torch.Size([3243, 300]). size mismatch for gru_pade.weight_ih_l0: copying a param with shape torch.Size([1536, 300]) from checkpoint, the shape in current model is torch.Size([768, 300]). size mismatch for gru_pade.weight_hh_l0: copying a param with shape torch.Size([1536, 512]) from checkpoint, the shape in current model is torch.Size([768, 256]). size mismatch for gru_pade.bias_ih_l0: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for gru_pade.bias_hh_l0: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for gru_pade.weight_ih_l0_reverse: copying a param with shape torch.Size([1536, 300]) from checkpoint, the shape in current model is torch.Size([768, 300]). size mismatch for gru_pade.weight_hh_l0_reverse: copying a param with shape torch.Size([1536, 512]) from checkpoint, the shape in current model is torch.Size([768, 256]). size mismatch for gru_pade.bias_ih_l0_reverse: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for gru_pade.bias_hh_l0_reverse: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for gru_pade.weight_ih_l1: copying a param with shape torch.Size([1536, 1024]) from checkpoint, the shape in current model is torch.Size([768, 512]). size mismatch for gru_pade.weight_hh_l1: copying a param with shape torch.Size([1536, 512]) from checkpoint, the shape in current model is torch.Size([768, 256]). size mismatch for gru_pade.bias_ih_l1: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for gru_pade.bias_hh_l1: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for gru_pade.weight_ih_l1_reverse: copying a param with shape torch.Size([1536, 1024]) from checkpoint, the shape in current model is torch.Size([768, 512]). size mismatch for gru_pade.weight_hh_l1_reverse: copying a param with shape torch.Size([1536, 512]) from checkpoint, the shape in current model is torch.Size([768, 256]). size mismatch for gru_pade.bias_ih_l1_reverse: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for gru_pade.bias_hh_l1_reverse: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

wlkdb commented 2 years ago

用新上传的代码跑通测试了,非常感谢!