In the code snippet above, I notice that the value of cur_image_idx doesn't change within a single batch. This implies that cur_image_features remain identical for images within the same batch, which seems unusual. Could you confirm if this is the intended behavior?
Another point of confusion I have pertains to the line for j in range(5): and the expression j*16. Based on the settings used in the Resampler, I would expect the image_features to have dimensions [batch_size8, 64, 5120]. Can you clarify why the image features are selected using for j in range(5): and `j16`?
https://github.com/thunlp/LLaVA-UHD/blob/69e75d0cc6bc4d6000045f08f94852d2d465cd91/llava_uhd/train/llava-uhd/adapt_llava.py#L169-L173
In the code snippet above, I notice that the value of
cur_image_idx
doesn't change within a single batch. This implies thatcur_image_features
remain identical for images within the same batch, which seems unusual. Could you confirm if this is the intended behavior?Another point of confusion I have pertains to the line
for j in range(5):
and the expressionj*16
. Based on the settings used in the Resampler, I would expect theimage_features
to have dimensions [batch_size8, 64, 5120]. Can you clarify why the image features are selected usingfor j in range(5):
and `j16`?