[Question] `image_features` not matched to input text

In the code snippet above, I notice that the value of cur_image_idx doesn't change within a single batch. This implies that cur_image_features remain identical for images within the same batch, which seems unusual. Could you confirm if this is the intended behavior?
Another point of confusion I have pertains to the line for j in range(5): and the expression j*16. Based on the settings used in the Resampler, I would expect the image_features to have dimensions [batch_size8, 64, 5120]. Can you clarify why the image features are selected using for j in range(5): and `j16`?

thunlp / LLaVA-UHD