Closed Charles-ux-bit closed 1 year ago
我们最终没有使用 role embeds,这几个参数实际上没有参与计算,所以不会影响解码效果
好的,非常感谢。但是我在实际使用中(使用--load参数指定thu-coai/EVA1.0, thu-coai/EVA1.0, thu-coai/EVA2.0-base, thu-coai/EVA2.0-large和thu-coai/EVA2.0-xlarge),但是多轮对话的效果都不是特别好,以下是一些例子。请问这个是正常的情况么?谢谢。
eva 1.0(使用了arguments.py中的所有默认参数)
eva 2.0 large(使用了arguments.py中的默认参数,除了将number of beam search修改为5)
EVA1.0 是正常情况。这个是训练数据的问题,因为里面包含了比较多的电商数据,所以对于“您好”的回复很大概率是客服场景的,可以试试“今天天气怎么样”这种问题。这部分训练数据在2.0的时候已经去除了。
EVA2.0 large 看起来不太正常,一般回复不会这么长。可以换一些问题试试,或者将长度惩罚调大一点
好的。我把num_beam调整为1,感觉正常了。请问EVA模型的最佳解码方案,可以指导一下吗~另外关于知识问答方面,感觉能力还不是非常好,例如问 "北京在哪里",回复是 "北京在心里",我们的模型是使用对话数据train from scratch的,以及数据中没有太多类似于知识问答数据的,是吧?十分感谢!
好的,非常感谢。
如图,我在加载huggingface模型时遇到we如下问题,请问是否会影响解码效果?在eva_interactive.py中也有这个现象。谢谢。 Some weights of EVAModel were not initialized from the model checkpoint at thu-coai/EVA1.0 and are newly initialized: ['decoder.role_embeds.weight', 'role_embeds.weight', 'encoder.role_embeds.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Some weights of EVAModel were not initialized from the model checkpoint at thu-coai/EVA2.0-base and are newly initialized: ['decoder.role_embeds.weight', 'role_embeds.weight', 'encoder.role_embeds.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Some weights of EVAModel were not initialized from the model checkpoint at thu-coai/EVA2.0-large and are newly initialized: ['decoder.role_embeds.weight', 'role_embeds.weight', 'encoder.role_embeds.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference