[ ] awesome-llm-planning-reasoning/README.md at main · samkhur006/awesome-llm-planning-reasoning

awesome-llm-planning-reasoning

About

Welcome to the Awesome LLMs Planning Reasoning repository! This collection is dedicated to exploring the rapidly evolving field of Large Language Models (LLMs) and their capabilities in planning and reasoning.

Overview

As LLMs continue to demonstrate remarkable success in Natural Language Understanding (NLU) and Natural Language Generation (NLG), researchers are increasingly interested in assessing their abilities beyond traditional NLP tasks. One of the most promising and challenging areas of study is understanding how well LLMs can perform tasks that require planning and reasoning. These capabilities are essential for leveraging LLMs in more complex, real-world scenarios, such as autonomous decision-making, problem-solving, and strategic thinking. However, recent research suggests that LLMs often struggle with reasoning tasks that are relatively simple for most humans, highlighting the limitations of these models in this critical area.

This repository is a curated list of research papers, code repositories, and benchmarks that focus on the intersection of LLMs with planning and reasoning tasks. Here, you'll find:

Techniques: Innovative methods that enable LLMs to reason and plan effectively, such as Chain-of-Thought prompting and Tree of Thoughts.
Reasoning Limitations: Critical investigations that explore the limitations and challenges LLMs face in planning and reasoning tasks.
Benchmarks: Standardized tests and evaluations designed to measure the performance of LLMs in these complex tasks.
Miscellaneous Papers: Papers related to the field of LLMs and reasoning, but not directly focused on planning tasks.
Additional Resources: Supplementary materials such as slides, dissertations, and other resources that provide further insights into LLM planning and reasoning.

Whether you're a researcher, developer, or enthusiast, this repository serves as a comprehensive resource for staying updated on the latest advancements and understanding the current challenges in the domain of LLMs' planning and reasoning abilities. Dive in and explore the fascinating world where language models meet high-level cognitive tasks!

Techniques

Paper	Link	Code	Venue	Date	Other
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models	arXiv	--	NeurIPS 22	28 Jan 2022	Video
Self-Consistency Improves Chain of Thought Reasoning in Language Models	arXiv	--	ICLR 23	7 Mar 2023	Video
REACT: Synergizing Reasoning and Acting in Language Models	arXiv	GitHub	ICLR 23	10 Mar 2023	Project Video
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models	arXiv	GitHub	ICCV 23	30 Mar 2023	Project
Least-To-Most Prompting Enables Complex Reasoning In Large Language Models	arXiv	--	ICLR 23	16 Apr 2023
Chain-of-Symbol Prompting Elicits Planning in Large Language Models	arXiv	GitHub	ICLR 24	17 May 2023
PlaSma: Procedural Knowledge Models for Language based Planning and Re-Planning	arXiv	GitHub	ICLR 24	26 Jul 2023
Better Zero-Shot Reasoning with Role-Play Prompting	arXiv	GitHub	NAACL 24	15 Aug 2023
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency	arXiv	GitHub	arXiv	27 Sep 2023
Reasoning with Language Model is Planning with World Model	arXiv	GitHub	EMNLP 23	23 Oct 2023
Large Language Models as Commonsense Knowledge for Large-Scale Task Planning	arXiv	GitHub	NeurIPS 23	30 Oct 2023	Project
PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization	arXiv	GitHub	ICLR 24	7 Dec 2023
Tree of Thoughts: Deliberate Problem Solving with Large Language Models	arXiv	GitHub	NeurIPS 23	3 Dec 2023	Video
Learning adaptive planning representations with natural language guidance	arXiv	--	arXiv	13 Dec 2023
The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction	arXiv	GitHub	ICLR 24	21 Dec 2023
Large Language Models can Learn Rules	arXiv	--	arXiv	24 Apr 2024
What's the Plan? Evaluating and Developing Planning-Aware Techniques for Language Models	arXiv	--	arXiv	22 May 2024
Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models	arXiv	GitHub	arxiv	6 Jun 2024
Large Language Models Can Learn Temporal Reasoning	arXiv	GitHub	ACL 24	11 Jun 2024
Flow of Reasoning: Efficient Training of LLM Policy with Divergent Thinking	arXiv	GitHub	arXiv	24 Jun 2024
Tree Search for Language Model Agents	arXiv	GitHub	arXiv	1 Jul 2024	Project
Tree-Planner: Efficient Close-loop Task Planning with Large Language Models	arXiv	GitHub	ICLR 24	24 Jul 2024	Project
RELIEF: Reinforcement Learning Empowered Graph Feature Prompt Tuning	arXiv	--	arXiv	6 Aug 2024
Automating Thought of Search: A Journey Towards Soundness and Completeness	arXiv	--	arXiv	21 Aug 2024

Reasoning Limitations

Paper	Link	Code	Venue	Date	Other
Understanding the Capabilities of Large Language Models for Automated Planning	arXiv	--	arXiv	25 May 2023
Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond	arXiv	GitHub	arXiv	8 Aug 2023
Evaluating Cognitive Maps and Planning in Large Language Models with CogEval	arXiv	GitHub	NeurIPS 23	2 Nov 2023
On the Planning Abilities of Large Language Models : A Critical Investigation	arXiv	GitHub	NeurIPS 23	6 Nov 2023
Large Language Models Cannot Self-Correct Reasoning Yet	arXiv	--	ICLR 24	14 Mar 2024
Dissociating language and thought in large language models	arXiv	--	Trends in Cognitive Sciences	23 Mar 2024
Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks	arXiv	GitHub	NAACL 24	28 Mar 2024
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?	arXiv	--	arXiv	13 May 2024	Video
On the Brittle Foundations of ReAct Prompting for Agentic Large Language Models	arXiv	--	arXiv	22 May 2024
Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models	arXiv	GitHub	EACL 24	24 May 2023
Can Graph Learning Improve Task Planning?	arXiv	GitHub	arXiv	29 May 2024
Graph-enhanced Large Language Models in Asynchronous Plan Reasoning	arXiv	GitHub	ICML 24	3 Jun 2024
When is Tree Search Useful for LLM Planning? It Depends on the Discriminator	arXiv	GitHub	ACL 24	6 Jun 2024
Chain of Thoughtlessness? An Analysis of CoT in Planning	arXiv	--	arXiv	6 Jun 2024
Can LLMs Learn from Previous Mistakes? Investigating LLMs' Errors to Boost for Reasoning	arXiv	GitHub	ACL 24	7 Jun 2024
Can Language Models Serve as Text-Based World Simulators?	arXiv	GitHub	ACL 24	10 Jun 2024
LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks	arXiv	--	ICML 24	12 Jun 2024	Video
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models	arXiv	GitHub	arXiv	13 Jul 2024
On the Self-Verification Limitations of Large Language Models on Reasoning and Planning Tasks	arXiv	--	arXiv	3 Aug 2024
Does Reasoning Emerge? Examining the Probabilities of Causation in Large Language Models	arXiv	--	arXiv	15 Aug 2024

Benchmarks

Paper	Link	Code	Venue	Date	Other
Benchmarks for Automated Commonsense Reasoning: A Survey	arXiv	--	arXiv	22 Feb 2023
BioPlanner: Automatic Evaluation of LLMs on Protocol Planning in Biology	arXiv	GitHub	EMN

irthomasthomas / undecidability