Hello, I find that in instruction-following and coding domain, get_gpt4_score function use gpt4_turbo_generate for scoring, but in math domain it use gpt4_generate. What is the reason for this setup?
Because we found that gpt4 scored clearly better than 4 turbo on the math task, and there was little difference on the IF and coding tasks. So we used 4 turbo in order to save cost.
Hello, I find that in instruction-following and coding domain, get_gpt4_score function use gpt4_turbo_generate for scoring, but in math domain it use gpt4_generate. What is the reason for this setup?