Open lukasberglund opened 9 months ago
Hello, thanks for your interest in our paper and a great question! Indeed, GPT-4 will do a lot better if you give few-shot examples. However, giving few-shot examples for theory-of-mind questions means you are trying to make the model directly depend on lower-level processes (e.g., shortcut pattern matching). This is violating the "mentalizing" criteria for ToM validation, which is mentioned in our paper. Hope this helps!
Hi! I enjoyed reading your paper. I also appreciate that you provided all your code. I suspect that GPT-4 would do a lot better at some of the questions (for example the accessibility questions) if you gave it a few-shot prompt (e.g. a five shot prompt). Did you try this out at all? If so, how well did models do?