CHAIR hallucination evaluation

running-alpaca commented 2 months ago

Hello! I encountered some problems when using code to reproduce the CHAIR metric in the paper. When I set max new_tokens to 64, I obtained CHAIRs=19.4 and CHAIRi=6.4. This is somewhat different from the CHAIRs=14.2 and CHAIRi=5.2 in the paper. And when max_cew_tokens is set to 512, the results obtained are similar to those in the paper. In these two experiments, I only changed the max_new_tokens of the model.generate in the chair_eval. py file.(Change from 512 to 64). Therefore, I would like to inquire about how to reproduce the result of max_new_tokens=64 in the paper. Do I need to change any other parameters or code?

shikiw commented 2 months ago

Hi,

Since the model usually generate more than 64 tokens and max_new_tokens=64 may truncate the sentence. Thereby, we will discard the last sentence if it is incomplete. You are recommended to do this step when running chair_eval.py.

running-alpaca commented 2 months ago

Thank you for your reply! I will try this later!

shikiw / OPERA

CHAIR hallucination evaluation #30