shmsw25 / FActScore

A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"
https://arxiv.org/abs/2305.14251
MIT License
275 stars 40 forks source link

Generation details for models under data/labeled #31

Open caiqizh opened 10 months ago

caiqizh commented 10 months ago

Thank you for the excellent work!

I have a question regarding the generation of outputs in the data/labeled files. Specifically, I'm curious about the parameters and prompts you used during this process. I've noticed that my generated text (e.g. from ChatGPT) is much longer than the content in your file. Could you please provide information on the settings you employed, such as temperature, max_tokens, and prompts, when generating the biographies? Your assistance in this matter would be greatly appreciated.

Thank you in advance!

shmsw25 commented 10 months ago

Hi @caiqizh, thank you for your interest in our work.

Here is the prompt we used for ChatGPT: image

Here are two hyperparameters:

Using different max_tokens should not affect the generations unless the generation exceed max_tokens , which never happened in our case. Given this, I think it is possible that you are seeing much longer responses due to the internal changes in ChatGPT (if it's not due to the difference in the prompt).

Let me know if you have any further questions. Thanks.