usail-hkust / LLMTSCS

Official code for article "LLMLight: Large Language Models as Traffic Signal Control Agents".
139 stars 12 forks source link

If the fine-tuning on this work is instruction fine-tuning? #21

Open ShuhongDai opened 1 month ago

ShuhongDai commented 1 month ago

Very great job! May I ask if the fine-tuning on this work is instruction fine-tuning?

Gungnir2099 commented 3 weeks ago

Thanks for your interest. Instruction fine-tuning is a broader concept. On the one hand, although our proposed imitation fine-tuning follows the paradigm of instruction fine-tuning, its purpose is to make the LLM learn the best decisions made by GPT-4. On the other hand, Critic-guided policy refinement is designed to encourage the LLM to make better decisions. You can take a look at our paper for detailed designs.

xingxindrst commented 2 weeks ago

感谢您的关注。指令微调是一个更广泛的概念。一方面,虽然我们提出的模仿微调遵循指令微调的范式,但其目的是让 LLM 学习 GPT-4 做出的最佳决策。另一方面,评论家指导的策略细化旨在鼓励 LLM 做出更好的决策。您可以查看我们的论文了解详细设计。

请问是让LLM学习GPT-4的决策形式吗,就是按照YOUR_CHOICE.这个格式输出,但是优化决策的话需要评论家指导的策略吗