Closed ArcticBeat05 closed 1 month ago
https://arxiv.org/pdf/2312.02179 - Training Chain-of-Thought via Latent-Variable Inference By Google
https://arxiv.org/pdf/2402.05808 - Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning
https://arxiv.org/pdf/2401.08967 - REFT: Reasoning with REinforced Fine-Tuning by ByteDance Research
merged
https://arxiv.org/pdf/2312.02179 - Training Chain-of-Thought via Latent-Variable Inference By Google
https://arxiv.org/pdf/2402.05808 - Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning
https://arxiv.org/pdf/2401.08967 - REFT: Reasoning with REinforced Fine-Tuning by ByteDance Research