Closed linyongnan closed 1 year ago
Hi, I appreciate your excellent work! I want to ask about the hotpotQA setting. It appears that your framework relies on the oracle evaluator (gold label) to determine whether self-reflection should occur. I am curious about how this framework operates during inference time when the gold label is unavailable. Thank you once again.
I guess we have to assume that in the inference time, the virtual environment must have the oracle label and will tell the agent whether the answer is correct or not. Therefore, it cannot be applied to the cases of some realistic scenarios where no one knows the answer. I recalled that this limitation is discussed somewhere but couldn't find it now. Is this understanding correct? @noahshinn024 Thanks.
Hi, I appreciate your excellent work! I want to ask about the hotpotQA setting. It appears that your framework relies on the oracle evaluator (gold label) to determine whether self-reflection should occur. I am curious about how this framework operates during inference time when the gold label is unavailable. Thank you once again.
I guess we have to assume that in the inference time, the virtual environment must have the oracle label and will tell the agent whether the answer is correct or not. Therefore, it cannot be applied to the cases of some realistic scenarios where no one knows the answer. I recalled that this limitation is discussed somewhere but couldn't find it now. Is this understanding correct? @noahshinn024 Thanks.
It appears that the authors have presented the results without a Ground Truth (GT) in Figure 4 (a). Additionally, it would be helpful to know the specific settings used in this experiment. e.g., all the agents will go into the next reflexion round? Regarding the all the agents entering the next reflection round, is it possible that the success rate does not invariably exhibit a positive correlation with the number of reflection times? Thanks.
Hi, I appreciate your excellent work! I want to ask about the hotpotQA setting. It appears that your framework relies on the oracle evaluator (gold label) to determine whether self-reflection should occur. I am curious about how this framework operates during inference time when the gold label is unavailable. Thank you once again.
I guess we have to assume that in the inference time, the virtual environment must have the oracle label and will tell the agent whether the answer is correct or not. Therefore, it cannot be applied to the cases of some realistic scenarios where no one knows the answer. I recalled that this limitation is discussed somewhere but couldn't find it now. Is this understanding correct? @noahshinn024 Thanks.
It appears that the authors have presented the results without a Ground Truth (GT) in Figure 4 (a). Additionally, it would be helpful to know the specific settings used in this experiment. e.g., all the agents will go into the next reflexion round? Regarding the all the agents entering the next reflection round, is it possible that the success rate does not invariably exhibit a positive correlation with the number of reflection times? Thanks.
For evaluation purpose only, maybe we could force agents to go for a fixed number of rounds no matter if their answers are correct or not and take the final rounds' answer as the inference results. Just a random guess. Hope the authors can reveal more details here.
Thanks for the comments and questions!
Happy to write further on these points if needed
Is this compliant, using GT answer in inference......It's so strange
Hi, I appreciate your excellent work! I want to ask about the hotpotQA setting. It appears that your framework relies on the oracle evaluator (gold label) to determine whether self-reflection should occur. I am curious about how this framework operates during inference time when the gold label is unavailable. Thank you once again.