Open Richar-Du opened 1 year ago
Hi, @Richar-Du,
I think something fishy might be going on. I will investigate into this. Thanks for raising this.
I have meet the same trouble because my version of transformers is incorrect. Maybe you need to check whether your version of transformers is lower than 4.27
According to the transformer in Huggingface, beam-search multinomial sampling can be implemented by setting
num_beams>1
anddo_sample=True
. However, this is not supported in LAVIS. If I setnum_beams=4, num_return_sequences=4
anddo_sample=True
simultaneously, there is an error as follows:During generation, the size is normal when generating the first token, both query_layer and key_layer are
torch.Size([64, 12, 5, 64])
. However, when generating the second token, the size of key_layer becometorch.Size([4, 12, 577, 64])
. So I think there may be something wrong with the image caption. By the way, 5 is my prompt_length and 12 is the attention head.Could you figure out where the error is? Thanks in advance :)