请问role embeds参数的初始化问题

Charles-ux-bit commented 1 year ago

如图，我在加载huggingface模型时遇到we如下问题，请问是否会影响解码效果？在eva_interactive.py中也有这个现象。谢谢。 Some weights of EVAModel were not initialized from the model checkpoint at thu-coai/EVA1.0 and are newly initialized: ['decoder.role_embeds.weight', 'role_embeds.weight', 'encoder.role_embeds.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Some weights of EVAModel were not initialized from the model checkpoint at thu-coai/EVA2.0-base and are newly initialized: ['decoder.role_embeds.weight', 'role_embeds.weight', 'encoder.role_embeds.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Some weights of EVAModel were not initialized from the model checkpoint at thu-coai/EVA2.0-large and are newly initialized: ['decoder.role_embeds.weight', 'role_embeds.weight', 'encoder.role_embeds.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference

t1101675 commented 1 year ago

我们最终没有使用 role embeds，这几个参数实际上没有参与计算，所以不会影响解码效果

Charles-ux-bit commented 1 year ago

好的，非常感谢。但是我在实际使用中（使用--load参数指定thu-coai/EVA1.0, thu-coai/EVA1.0, thu-coai/EVA2.0-base, thu-coai/EVA2.0-large和thu-coai/EVA2.0-xlarge），但是多轮对话的效果都不是特别好，以下是一些例子。请问这个是正常的情况么？谢谢。

eva 1.0（使用了arguments.py中的所有默认参数）

eva 2.0 large（使用了arguments.py中的默认参数，除了将number of beam search修改为5）

t1101675 commented 1 year ago

EVA1.0 是正常情况。这个是训练数据的问题，因为里面包含了比较多的电商数据，所以对于“您好”的回复很大概率是客服场景的，可以试试“今天天气怎么样”这种问题。这部分训练数据在2.0的时候已经去除了。

EVA2.0 large 看起来不太正常，一般回复不会这么长。可以换一些问题试试，或者将长度惩罚调大一点

Charles-ux-bit commented 1 year ago

好的。我把num_beam调整为1，感觉正常了。请问EVA模型的最佳解码方案，可以指导一下吗~另外关于知识问答方面，感觉能力还不是非常好，例如问 "北京在哪里"，回复是 "北京在心里"，我们的模型是使用对话数据train from scratch的，以及数据中没有太多类似于知识问答数据的，是吧？十分感谢！

Jiaxin-Wen commented 1 year ago

最佳解码方案可以参见论文，但也可以适当调整。
是的

Charles-ux-bit commented 1 year ago

好的，非常感谢。

thu-coai / EVA

请问role embeds参数的初始化问题 #76