eval results make no sense

findlet39 commented 1 month ago

使用您预训练的权重去进行语义推断，发现生成的句子是一些杂乱无章的单词拼凑而成，不知道是什么原因。我的操作步骤是将bert-base-uncased的模型文件下载到本地，然后修改对应的模型路径；从您给的链接下载了视觉编码器的模型和预训练的权重，分别放在了本地文件夹下，在coco_eval.py中更改了对应的路径。我在自己的照片和coco2014数据集上都进行了推断，都是生成杂乱无章的句子。 p.s.在运行的时候给出了以下警告： Some weights of the model checkpoint at D:\pyproject\LAVIS-main\models\blip_2\bert-base-uncased were not used when initializing BertLowModel: ['bert.encoder.layer.6.attention.self.query.weight', 'bert.encoder.layer.7.attention.self.value.bias', 'bert.encoder.layer.10.output.dense.bias', 'bert.encoder.layer.9.attention.self.value.weight', 'bert.encoder.layer.8.attention.self.key.weight', 'bert.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.layer.11.intermediate.dense.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.bias', 'bert.encoder.layer.8.attention.self.key.bias', 'bert.encoder.layer.8.attention.self.value.bias', 'bert.encoder.layer.8.output.dense.bias', 'cls.predictions.transform.dense.weight', 'bert.encoder.layer.10.intermediate.dense.weight', 'bert.encoder.layer.6.output.LayerNorm.bias', 'bert.encoder.layer.6.attention.self.key.weight', 'bert.encoder.layer.9.attention.self.query.weight', 'bert.encoder.layer.10.attention.output.dense.weight', 'bert.encoder.layer.9.attention.self.query.bias', 'bert.encoder.layer.9.intermediate.dense.bias', 'bert.encoder.layer.10.output.dense.weight', 'bert.encoder.layer.7.attention.output.dense.weight', 'bert.encoder.layer.8.intermediate.dense.weight', 'bert.encoder.layer.9.attention.self.value.bias', 'bert.encoder.layer.7.intermediate.dense.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.weight', 'bert.encoder.layer.8.output.dense.weight', 'bert.encoder.layer.9.output.dense.weight', 'bert.encoder.layer.10.attention.self.query.bias', 'bert.encoder.layer.8.output.LayerNorm.weight', 'bert.encoder.layer.7.output.dense.bias', 'bert.encoder.layer.9.attention.output.dense.weight', 'bert.encoder.layer.6.output.dense.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.self.value.weight', 'bert.encoder.layer.11.attention.self.key.weight', 'bert.encoder.layer.7.attention.self.query.bias', 'bert.encoder.layer.10.attention.self.key.bias', 'bert.encoder.layer.8.intermediate.dense.bias', 'bert.encoder.layer.9.attention.output.LayerNorm.weight', 'bert.encoder.layer.9.output.LayerNorm.bias', 'bert.encoder.layer.9.intermediate.dense.weight', 'bert.encoder.layer.6.attention.output.dense.weight', 'bert.encoder.layer.7.output.dense.weight', 'bert.encoder.layer.9.output.dense.bias', 'bert.encoder.layer.10.attention.output.LayerNorm.weight', 'bert.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.layer.6.output.LayerNorm.weight', 'bert.encoder.layer.7.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.self.value.bias', 'bert.encoder.layer.7.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.self.key.bias', 'bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'bert.encoder.layer.9.attention.output.dense.bias', 'bert.encoder.layer.7.attention.output.dense.bias', 'bert.encoder.layer.8.attention.self.query.bias', 'bert.encoder.layer.8.attention.output.LayerNorm.weight', 'bert.encoder.layer.7.attention.self.value.weight', 'bert.encoder.layer.11.output.dense.weight', 'bert.encoder.layer.6.intermediate.dense.weight', 'bert.encoder.layer.6.attention.self.value.weight', 'bert.encoder.layer.9.output.LayerNorm.weight', 'bert.encoder.layer.11.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.self.query.weight', 'cls.predictions.transform.dense.bias', 'bert.encoder.layer.11.attention.output.dense.bias', 'bert.encoder.layer.11.output.dense.bias', 'bert.encoder.layer.6.output.dense.bias', 'bert.encoder.layer.6.attention.output.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'bert.encoder.layer.6.attention.self.query.bias', 'bert.encoder.layer.11.attention.output.LayerNorm.weight', 'bert.encoder.layer.7.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.self.query.weight', 'bert.encoder.layer.10.attention.self.value.bias', 'bert.encoder.layer.11.attention.output.dense.weight', 'bert.encoder.layer.8.attention.output.dense.weight', 'bert.encoder.layer.7.intermediate.dense.weight', 'bert.encoder.layer.8.attention.output.dense.bias', 'bert.encoder.layer.8.attention.self.query.weight', 'bert.encoder.layer.9.attention.self.key.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'bert.encoder.layer.7.attention.self.key.bias', 'cls.predictions.transform.LayerNorm.weight', 'bert.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.layer.10.attention.output.LayerNorm.bias', 'bert.encoder.layer.7.attention.self.query.weight', 'bert.encoder.layer.9.attention.self.key.bias', 'bert.encoder.layer.6.attention.output.LayerNorm.bias', 'bert.encoder.layer.7.attention.self.key.weight', 'bert.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.layer.11.intermediate.dense.bias', 'bert.encoder.layer.6.attention.self.key.bias', 'bert.encoder.layer.8.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.self.value.weight', 'bert.encoder.layer.6.attention.output.LayerNorm.weight', 'bert.encoder.layer.10.attention.output.dense.bias', 'bert.encoder.layer.11.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.self.key.weight', 'bert.encoder.layer.8.attention.self.value.weight', 'bert.encoder.layer.10.intermediate.dense.bias', 'bert.encoder.layer.11.attention.self.value.bias', 'bert.encoder.layer.11.attention.self.query.bias', 'cls.predictions.bias']

This IS expected if you are initializing BertLowModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing BertLowModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). 有一些权重没有加载，这是导致这种问题的原因吗？

wanghua-lei commented 1 month ago

这个权重没加载是正常的，毕竟encoder加载的是bert前6层，后6层没加载，但是decoder重建加载的是整个bert12层，但是作者好像把decoder加载并没有经过后6层，不知道是不是我理解的不对

然后我的推理结果也是这样，会有很多无意义的词或者是"."或者是重复表达或者是主体错误，不知道是什么原因

findlet39 commented 1 month ago

钉~已经收到你的邮件啦

wangyuchi369 commented 1 month ago

@wanghua-lei @findlet39 你们好！感谢你们的疑问，首先关于预训练的加载 @wanghua-lei 的说法是对的，为了代码实现方便我把decoder全部12层参数都加载进去了，但forward里面没过前六层。然后乱码的问题我们测试了一下，怀疑是你们听从了之前一个issue的建议或者config里的默认设置把variance diate设成了9，但我们提供的这个ckpt版本是var_dilate为4的，因此inference的时候这个ckpt没有见过噪声如此大的情况。

为此，我们在新的commit里加了一个参数"var_dilate_val"，把训练和测试时的这两个variance分开了，"var_dilate_val"建议设为和训练时匹配的值或者直接设为1即可。（至少保证val_var < train_var）

我们的测试结果如下，也会有权重没有加载的提示，然后输出结果正常，供参考。如果不是这个问题欢迎继续comment。

wanghua-lei commented 1 month ago

@wangyuchi369 感谢作者回答，还有一个疑问，在这个训练部分需要在dim维度concat一个全0的向量这个做法的目的是什么，是把 0 的这个信息引入，帮助将这些 special token 预测成全0吗，可以在dim维度不变直接相加➕吗？

wangyuchi369 commented 1 month ago

@wanghua-lei Hi, 这个是self-conditioning的technique，请参考https://arxiv.org/pdf/2208.04202

wangyuchi369 / LaDiC

eval results make no sense #3