teacherpeterpan / ProgramFC

Codes for ACL 2023 Paper "Fact-Checking Complex Claims with Program-Guided Reasoning"
MIT License
27 stars 10 forks source link

Results on GPT-4 are lower than the reuslts presented in the paper? #2

Open hustcxx opened 10 months ago

hustcxx commented 10 months ago

Great jobs. I have some questions for the authors.

  1. I run the code on the GPT-4 with the same parameter settings, but the results (macro-F1) for using GPT-4 as the program generator (N=1, gold), but the results on FEVEROUS are lower than the results using text-davinci-003 presented in the github . FEVEROUS with GPT4: 91.05 FEVEROUS with text-davinci-003: 92.32 (presented in the github) This result is very confusing.
  2. I would like to know if the results reported in the paper as well as github are in the full dataset or the partially sampled dataset?