Open mustafaadogan opened 9 months ago
Hi Mustafa, have you solved this problem?
I tackled the same scoring challenge but stumbled upon poor performance in zero-shot inference for certain benchmarks, sometimes even worse than random chance. Here's the code I employed:
def calculate_perplexity(self, prompt, vision_x):
"""
Calculate the perplexity score given a prompt and vision data.
Parameters:
- prompt (str): The input prompt.
- vision_x (torch.Tensor): Tensor containing vision data.
Returns:
float: Model score.
"""
if self.model is None:
raise AttributeError('Model is not initialized. Call load_model first!')
lang_x = self.tokenizer(
[prompt],
return_tensors="pt",
)
with torch.no_grad():
model_output = self.model(
vision_x=vision_x.to(self.device),
lang_x=lang_x["input_ids"].to(self.device),
attention_mask=lang_x["attention_mask"].to(self.device)
)
logits = model_output.logits[0].to(self.device)
true_labels = lang_x["input_ids"].view(-1).to(self.device) # Flatten the true labels
# Calculate cross-entropy loss
loss = self.crit(logits, true_labels)
# Calculate perplexity
perplexity = loss.mean().exp()
return float(perplexity)
Thanks Mustafa!!!
I'm currently working on Open Flamingo which involves calculating perplexity scores for given sentence-image pairs. I've encountered an issue where the perplexity scores for two captions (one true and one false) are turning out to be the same, despite one of them being incorrect.
I've implemented a perplexity calculation method in Python using PyTorch. The method involves extracting logits from the model output, obtaining true labels from the input text, and then calculating perplexity based on the probabilities assigned to the true labels.
I've ensured that the token IDs are correctly indexed, and the perplexity calculation seems to be set up correctly. However, the perplexity scores are resulting in nan, and I suspect there might be an issue with the softmax probabilities or numerical instability.
To avoid nan values, I added following code block:
This time, I get same scores for my captions.
Example captions:
True caption: Breakfast items including juice are on the table.
False caption: Breakfast items including juice are off the table.