Closed ijyliu closed 11 months ago
This is very hard for me to evaluate, as it requires being able to judge outputs in a foreign language.
What I am proposing is similar to "pivot prompting" (https://arxiv.org/pdf/2301.08745.pdf), except there is a task performed in the middle.
Possibly return to this, but only on the condition of having another group member that is good at evaluating foreign language tasks.
Or finding some way to get automated evaluation... but that seems dicey.
Is it translating under the hood?
Does splitting tasks in foreign languages into translation then task then translation improve performance?