janhq / ichigo

Local realtime voice AI
Apache License 2.0
1.96k stars 98 forks source link

idea: Training Ichigo on Structured Output #126

Open hahuyhoang411 opened 1 day ago

hahuyhoang411 commented 1 day ago

Problem Statement

Current LLM development is moving toward structured output. It's proved to improve model performance in various tasks. Also when training with structured output, we can explore further into training long context which we haven't trained.

Idea

Reference: https://arxiv.org/pdf/2411.10440

hahuyhoang411 commented 1 day ago

Paper summary:

Structured responses significantly improve the model’s systematic reasoning ability. To achieve this, they design <SUMMARY>, <CAPTION>, <REASONING>, <CONCLUSION> tags to help the model recognize the current stage of reasoning, and create the LLaVA-o1-100k dataset by using GPT-4o to generate stage-level reasoning.

Inference time scaling: Unlike previous methods like best-of-N search and sentence-level beam search, we propose a novel stage-level beam search method. Specifically, we generate multiple responses for each stage (marked by tags) and select the best one to proceed to the next stage.