Closed guihonghao closed 1 month ago
This is due to pixel values being padded when batch_size > 1.
Pull the latest code, this issue should have been fixed.
但是我的batch_size设置的是1啊
I guess it's because the input_ids were truncated due to insufficient length, try increasing the --max_length
parameter.
fixed
这段代码处报错。 打印了几行结果看,input_embeds应该是包含了文本和图片token,所以vit_embeds.reshape(-1, vit_embeds.shape[-1])后的第一维应该小于inputs_embeds的第一维。6 256 = 1536 < 2549。但是上面的例子里面 3840 = 15 256 > 3628,vit_embeds第一维>inputs_embeds第一维导致错误,这是什么原因呢?