Open li-aolong opened 2 days ago
The primary distinction between our work and Self-Verification lies in the following aspects:
Self-Verification employs the Best-of-N decoding method, which instructs LLMs to generate multiple solutions, score each using a scoring function, and select the highest-scoring solution as the final answer. However, the Best-of-N method fails when the correct answer is not included in the generated sample set. In contrast, ProCo utilizes an iterative verify-then-correct framework that progressively identifies and corrects (probably) erroneous responses to achieve the correct one. This approach avoids the repetition of previous mistakes and incrementally enhances response quality.
Self-Verification randomly selects conditions to mask during the process. By contrast, ProCo introduces a key condition identification method, which improves the accuracy of the substitute verification process.
Notably, the differences between Self-Verification and ProCo are substantial, as reflected in their methodologies and effectiveness. Additionally, I have not read this paper.
The core method of ProCo, substitute verification, is identical to the method in Large Language Models are Better Reasoners with Self-Verification by Weng et al. (2023).
The main difference lies in the approach Weng et al. (2023) use to generate the initial answer, where multiple answers are generated and the original question is masked for verification, eventually leading to one final answer. In contrast, ProCo generates one answer at a time, iterating multiple times to arrive at the final answer. Additionally, for the question masking method, both ProCo and Weng et al. (2023) use two types. Weng et al. (2023) refer to them as True-False Item Verification and Condition Mask Verification, and ProCo’s classification is essentially the same.
Weng et al. (2023):
ProCo:
Questions: