thunlp / LLaVA-UHD

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
303 stars 15 forks source link

Meaning of '8' and '4' #13

Open phellonchen opened 5 months ago

phellonchen commented 5 months ago
                    for j in range(8):
                        cur_image_features = image_features[cur_image_idx+j*4]
                        cur_new_input_embeds.append(cur_image_features)
                        cur_new_labels.append(torch.full((cur_image_features.shape[0],), IGNORE_INDEX, device=cur_labels.device, dtype=cur_labels.dtype))

what is the meaning of '8' and '4'?

zyddnys commented 5 months ago

i got the same issue, if there's only one input image i then image_features shape is [8, 64, 5120], so cur_image_idx+j*4 will lead to out of bound