Generation details for models under data/labeled

shmsw25 / FActScore

A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"

MIT License

275 stars 40 forks source link

Hi @caiqizh, thank you for your interest in our work.

Here is the prompt we used for ChatGPT:

Here are two hyperparameters:

temp=0.7 for both ChatGPT and InstructGPT
max_tokens=512 for InstructGPT and max_tokens=1024 for ChatGPT

Using different max_tokens should not affect the generations unless the generation exceed max_tokens , which never happened in our case. Given this, I think it is possible that you are seeing much longer responses due to the internal changes in ChatGPT (if it's not due to the difference in the prompt).

Let me know if you have any further questions. Thanks.

shmsw25 / FActScore

Generation details for models under data/labeled #31