Open hahuyhoang411 opened 1 day ago
Paper summary:
Structured responses significantly improve the model’s systematic reasoning ability. To achieve this, they design <SUMMARY>, <CAPTION>, <REASONING>, <CONCLUSION>
tags to help the model recognize the current stage of reasoning, and create the LLaVA-o1-100k dataset by using GPT-4o to generate stage-level reasoning.
Inference time scaling: Unlike previous methods like best-of-N search and sentence-level beam search, we propose a novel stage-level beam search method. Specifically, we generate multiple responses for each stage (marked by tags) and select the best one to proceed to the next stage.
Problem Statement
Current LLM development is moving toward structured output. It's proved to improve model performance in various tasks. Also when training with structured output, we can explore further into training long context which we haven't trained.
Idea
Reference: https://arxiv.org/pdf/2411.10440