shure-dev / Awesome-LLM-Papers-Comprehensive-Topics

Awesome LLM Papers and repos on very comprehensive topics.
https://shorturl.at/bmuwC
187 stars 18 forks source link
agent agi awesome-list chainofthought chatgpt instruction-tuning llm llm-agent multimodal papers prompt-engineering rag reasoning reinforcement-learning robot robotics survey vlm vqa zero-shot

Awesome-LLM-Related-Papers-Comprehensive-Topics

Static Badge Static Badge GitHub Repo stars

We provide awesome papers and repos on very comprehensive topics as follows.

CoT / VLM / Quantization / Grounding / Text2IMG&VID / Prompt Engineering / Prompt Tuning / Reasoning / Robot / Agent / Planning / Reinforcement-Learning / Feedback / In-Context-Learning / Few-Shot / Zero-Shot / Instruction Tuning / PEFT / RLHF / RAG / Embodied / VQA / Hallucination / Diffusion / Scaling / Context-Window / WorldModel / Memory / Zero-Shot / RoPE / Speech / Perception / Survey / Segmentation / Learge Action Model / Foundation / RoPE / LoRA / PPO / DPO

We strongly recommend checking our Notion table for an interactive experience.

drawing


Number of papers and repos in total: 516


Category Title Links Date
Zero-shot Can Foundation Models Perform Zero-Shot Task Speci
fication For Robot Manipulation?
World-model Leveraging Pre-trained Large Language Models to Co
nstruct and Utilize World Models for Model-based Task Planning
ArXiv 2023/05/07
World-model Learning and Leveraging World Models in Visual Rep
resentation Learning
World-model Language Models Meet World Models ArXiv
World-model Learning to Model the World with Language ArXiv
World-model Diffusion World Model ArXiv
World-model Learning to Model the World with Language ArXiv
VisualPrompt Ferret-v2: An Improved Baseline for Referring and
Grounding with Large Language Models
ArXiv
VisualPrompt Making Large Multimodal Models Understand Arbitrar
y Visual Prompts
ArXiv
VisualPrompt What does CLIP know about a red circle? Visual pro
mpt engineering for VLMs
ArXiv
VisualPrompt MOKA: Open-Vocabulary Robotic Manipulation through
Mark-Based Visual Prompting
ArXiv
VisualPrompt SoM : Set-of-Mark PromptingUnleashes Extraordinary
Visual Grounding in GPT-4V
ArXiv
VisualPrompt Set-of-Mark Prompting Unleashes Extraordinary Visu
al Grounding in GPT-4V
ArXiv, GitHub
Video MA-LMM: Memory-Augmented Large Multimodal Model fo
r Long-Term Video Understanding
ArXiv
ViFM, Video InternVideo2: Scaling Video Foundation Models for
Multimodal Video Understanding
ArXiv, GitHub
VLM, World-model Large World Model ArXiv
VLM, VQA CogVLM: Visual Expert for Pretrained Language Mode
ls
ArXiv 2023/11/06
VLM, VQA Chameleon: Plug-and-Play Compositional Reasoning w
ith Large Language Models
ArXiv 2023/04/19
VLM, VQA DeepSeek-VL: Towards Real-World Vision-Language Un
derstanding01
VLM PaLM: Scaling Language Modeling with Pathways ArXiv 2022/04/05
VLM ScreenAI: A Vision-Language Model for UI and Infog
raphics Understanding
VLM MoE-LLaVA: Mixture of Experts for Large Vision-Lan
guage Models
ArXiv, GitHub
VLM LLaVA-NeXT: Improved reasoning, OCR, and world kno
wledge
GitHub
VLM Mini-Gemini: Mining the Potential of Multi-modalit
y Vision Language Models
ArXiv
Text-to-Image, World-model World Model on Million-Length Video And Language W
ith RingAttention
ArXiv
Tex2Img Be Yourself: Bounded Attention for Multi-Subject T
ext-to-Image Generation
ArXiv
Temporal Explorative Inbetweening of Time and Space ArXiv
Survey, Video Video Understanding with Large Language Models: A
Survey
ArXiv
Survey, VLM MM-LLMs: Recent Advances in MultiModal Large Langu
age Models
Survey, Training Understanding LLMs: A Comprehensive Overview from
Training to Inference
ArXiv
Survey, TimeSeries Large Models for Time Series and Spatio-Temporal D
ata: A Survey and Outlook
Survey Efficient Large Language Models: A Survey ArXiv, GitHub
Sora, Text-to-Video Sora: A Review on Background, Technology, Limitati
ons, and Opportunities of Large Vision Models
Sora, Text-to-Video Mora: Enabling Generalist Video Generation via A M
ulti-Agent Framework
ArXiv
Segmentation LISA: Reasoning Segmentation via Large Language Mo
del
ArXiv
Segmentation GRES: Generalized Referring Expression Segmentatio
n
Segmentation Generalized Decoding for Pixel, Image, and Languag
e
ArXiv
Segmentation SEEM: Segment Everything Everywhere All at Once ArXiv, GitHub
Segmentation SegGPT: Segmenting Everything In Context ArXiv
Segmentation Grounded SAM: Assembling Open-World Models for Div
erse Visual Tasks
ArXiv
Scaling Leave No Context Behind: Efficient Infinite Contex
t Transformers with Infini-attention
ArXiv
SLM, Scaling Textbooks Are All You Need ArXiv
Robot, Zero-shot BC-Z: Zero-Shot Task Generalization with Robotic I
mitation Learning
ArXiv
Robot, Zero-shot Universal Manipulation Interface: In-The-Wild Robo
t Teaching Without In-The-Wild Robots
ArXiv
Robot, Zero-shot Mirage: Cross-Embodiment Zero-Shot Policy Transfer
with Cross-Painting
ArXiv
Robot, Task-Decompose, Zero-shot Language Models as Zero-Shot Planners: Extracting
Actionable Knowledge for Embodied Agents
ArXiv 2022/01/18
Robot, Task-Decompose SayPlan: Grounding Large Language Models using 3D
Scene Graphs for Scalable Robot Task Planning
ArXiv 2023/07/12
Robot, TAMP LLM3:Large Language Model-based Task and Motion Pl
anning with Motion Failure Reasoning
ArXiv
Robot, TAMP Task and Motion Planning with Large Language Model
s for Object Rearrangement
ArXiv
Robot, Survey Toward General-Purpose Robots via Foundation Model
s: A Survey and Meta-Analysis
ArXiv 2023/12/14
Robot, Survey Language-conditioned Learning for Robotic Manipula
tion: A Survey
ArXiv 2023/12/17
Robot, Survey Robot Learning in the Era of Foundation Models: A
Survey
ArXiv 2023/11/24
Robot, Survey Real-World Robot Applications of Foundation Models
: A Review
ArXiv
Robot OK-Robot: What Really Matters in Integrating Open-
Knowledge Models for Robotics
ArXiv
Robot RoCo: Dialectic Multi-Robot Collaboration with Lar
ge Language Models
ArXiv
Robot Interactive Language: Talking to Robots in Real Ti
me
ArXiv
Robot Reflexion: Language Agents with Verbal Reinforceme
nt Learning
ArXiv 2023/03/20
Robot Generative Expressive Robot Behaviors using Large
Language Models
ArXiv
Robot RoboCat: A Self-Improving Generalist Agent for Rob
otic Manipulation
Robot Introspective Tips: Large Language Model for In-Co
ntext Decision Making
ArXiv
Robot PIVOT: Iterative Visual Prompting Elicits Actionab
le Knowledge for VLMs
ArXiv
Robot OCI-Robotics: Object-Centric Instruction Augmentat
ion for Robotic Manipulation
ArXiv
Robot DeliGrasp: Inferring Object Mass, Friction, and Co
mpliance with LLMs for Adaptive and Minimally Deforming Grasp Policies
ArXiv
Robot VoxPoser: Composable 3D Value Maps for Robotic Man
ipulation with Language Models
ArXiv
Robot Creative Robot Tool Use with Large Language Models ArXiv
Robot AutoTAMP: Autoregressive Task and Motion Planning
with LLMs as Translators and Checkers
ArXiv
RoPE RoFormer: Enhanced Transformer with Rotary Positio
n Embedding
ArXiv
Resource [Resource] Paperswithcode ArXiv
Resource [Resource] huggingface ArXiv
Resource [Resource] dailyarxiv ArXiv
Resource [Resource] Connectedpapers ArXiv
Resource [Resource] Semanticscholar ArXiv
Resource [Resource] AlphaSignal ArXiv
Resource [Resource] arxiv-sanity ArXiv
Reinforcement-Learning, VIMA FoMo Rewards: Can we cast foundation models as rew
ard functions?
ArXiv
Reinforcement-Learning, Robot Towards A Unified Agent with Foundation Models ArXiv
Reinforcement-Learning Large Language Models Are Semi-Parametric Reinforc
ement Learning Agents
ArXiv
Reinforcement-Learning RLang: A Declarative Language for Describing Parti
al World Knowledge to Reinforcement Learning Agents
ArXiv
Reasoning, Zero-shot Large Language Models are Zero-Shot Reasoners ArXiv
Reasoning, VLM, VQA MM-REACT: Prompting ChatGPT for Multimodal Reasoni
ng and Action
ArXiv 2023/03/20
Reasoning, Table Large Language Models are few(1)-shot Table Reason
ers
ArXiv
Reasoning, Symbolic Symbol-LLM: Leverage Language Models for Symbolic
System in Visual Human Activity Reasoning
ArXiv
Reasoning, Survey Reasoning with Language Model Prompting: A Survey ArXiv
Reasoning, Robot AlphaBlock: Embodied Finetuning for Vision-Languag
e Reasoning in Robot Manipulation
ArXiv
Reasoning, Reward LET’S REWARD STEP BY STEP: STEP-LEVEL REWARD MODEL
AS THE NAVIGATORS FOR REASONING
ArXiv
Reasoning, Reinforcement-Learning ReFT: Reasoning with Reinforced Fine-Tuning
Reasoning Selection-Inference: Exploiting Large Language Mod
els for Interpretable Logical Reasoning
ArXiv
Reasoning ReConcile: Round-Table Conference Improves Reasoni
ng via Consensus among Diverse LLMs.
ArXiv
Reasoning Self-Discover: Large Language Models Self-Compose
Reasoning Structures
ArXiv
Reasoning Chain-of-Thought Reasoning Without Prompting ArXiv
Reasoning Contrastive Chain-of-Thought Prompting ArXiv
Reasoning Rephrase and Respond(RaR)
Reasoning Take a Step Back: Evoking Reasoning via Abstractio
n in Large Language Models
ArXiv
Reasoning STaR: Bootstrapping Reasoning With Reasoning ArXiv 2022/05/28
Reasoning The Impact of Reasoning Step Length on Large Langu
age Models
ArXiv
Reasoning Beyond Natural Language: LLMs Leveraging Alternati
ve Formats for Enhanced Reasoning and Communication
ArXiv
Reasoning Large Language Models as General Pattern Machines ArXiv
RLHF, Reinforcement-Learning, Survey A Survey of Reinforcement Learning from Human Feed
back
RLHF Secrets of RLHF in Large Language Models Part II:
Reward Modeling
ArXiv
RAG, Temporal Logics FreshLLMs: Refreshing Large Language Models with S
earch Engine Augmentation
ArXiv
RAG, Survey Large Language Models for Information Retrieval: A
Survey
RAG, Survey Retrieval-Augmented Generation for Large Language
RAG, Survey Retrieval-Augmented Generation for Large Language
Models: A Survey
ArXiv
RAG Training Language Models with Memory Augmentation
RAG Self-RAG: Learning to Retrieve, Generate, and Crit
ique through Self-Reflection
RAG RAG-Fusion: a New Take on Retrieval-Augmented Gene
ration
ArXiv
RAG RAFT: Adapting Language Model to Domain Specific R
AG
ArXiv
RAG Adaptive-RAG: Learning to Adapt Retrieval-Augmente
d Large Language Models through Question Complexity
ArXiv
RAG RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Ca
se Study on Agriculture
ArXiv
RAG Fine-Tuning or Retrieval? Comparing Knowledge Inje
ction in LLMs
ArXiv
Quantization, Scaling SliceGPT: Compress Large Language Models by Deleti
ng Rows and Columns
ArXiv
Prompting, Survey A Systematic Survey of Prompt Engineering in Large
Language Models: Techniques and Applications
ArXiv
Prompting, Robot, Zero-shot Zero-Shot Task Generalization with Multi-Task Deep
Reinforcement Learning
ArXiv
Prompting Contrastive Chain-of-Thought Prompting ArXiv
PersonalCitation, Robot Text2Motion: From Natural Language Instructions to
Feasible Plans
ArXiv
Perception, Video, Vision CLIP4Clip: An Empirical Study of CLIP for End to E
nd Video Clip Retrieval
ArXiv
Perception, Task-Decompose DoReMi: Grounding Language Model by Detecting and
Recovering from Plan-Execution Misalignment
ArXiv 2023/07/01
Perception, Robot, Segmentation Language Segment-Anything
Perception, Robot LiDAR-LLM: Exploring the Potential of Large Langua
ge Models for 3D LiDAR Understanding
ArXiv 2023/12/21
Perception, Reasoning, Robot Reasoning Grasping via Multimodal Large Language M
odel
ArXiv
Perception, Reasoning DetGPT: Detect What You Need via Reasoning ArXiv
Perception, Reasoning Lenna: Language Enhanced Reasoning Detection Assis
tant
ArXiv
Perception Simple Open-Vocabulary Object Detection with Visio
n Transformers
ArXiv 2022/05/12
Perception Grounded Language-Image Pre-training ArXiv 2021/12/07
Perception Grounding DINO: Marrying DINO with Grounded Pre-Tr
aining for Open-Set Object Detection
ArXiv 2023/03/09
Perception PointCLIP: Point Cloud Understanding by CLIP ArXiv 2021/12/04
Perception DINO: DETR with Improved DeNoising Anchor Boxes fo
r End-to-End Object Detection
ArXiv
Perception Recognize Anything: A Strong Image Tagging Model ArXiv
Perception Simple Open-Vocabulary Object Detection with Visio
n Transformers
ArXiv
Perception Sigmoid Loss for Language Image Pre-Training ArXiv
Package LlamaIndex GitHub
Package LangChain GitHub
Package h2oGPT GitHub
Package Dify GitHub
Package Alpaca-LoRA GitHub
Package Promptlayer GitHub
Package unsloth GitHub
Package Instructor: Structured LLM Outputs GitHub
PRM Let's Verify Step by Step ArXiv
PRM Let's reward step by step: Step-Level reward model
as the Navigators for Reasoning
ArXiv
PPO, RLHF, Reinforcement-Learning Secrets of RLHF in Large Language Models Part I: P
PO
ArXiv 2024/02/01
Open-source, VLM OpenFlamingo: An Open-Source Framework for Trainin
g Large Autoregressive Vision-Language Models
ArXiv 2023/08/02
Open-source, SLM RecurrentGemma: Moving Past Transformers for Effic
ient Open Language Models
ArXiv
Open-source, Perception Grounding DINO: Marrying DINO with Grounded Pre-Tr
aining for Open-Set Object Detection
ArXiv
Open-source Gemma: Introducing new state-of-the-art open model
s
ArXiv
Open-source Mistral 7B ArXiv
Open-source Qwen Technical Report ArXiv
Navigation, Reasoning, Vision NavGPT: Explicit Reasoning in Vision-and-Language
Navigation with Large Language Models
ArXiv
Natural-Language-as-Polices, Robot RT-H: Action Hierarchies Using Language ArXiv
Multimodal, Robot, VLM Open-World Object Manipulation using Pre-trained V
ision-Language Models
ArXiv 2023/03/02
Multimodal, Robot MOMA-Force: Visual-Force Imitation for Real-World
Mobile Manipulation
ArXiv 2023/08/07
Multimodal, Robot Flamingo: a Visual Language Model for Few-Shot Lea
rning
ArXiv 2022/04/29
Multi-Images, VLM Mantis: Multi-Image Instruction Tuning ArXiv, GitHub
MoE Switch Transformers: Scaling to Trillion Parameter
Models with Simple and Efficient Sparsity
ArXiv
MoE Sparse MoE as the New Dropout: Scaling Dense and S
elf-Slimmable Transformers
ArXiv
Mixtral, MoE Mixtral of Experts ArXiv
Memory, Robot LLM as A Robotic Brain: Unifying Egocentric Memory
and Control
ArXiv 2023/04/19
Memory, Reinforcement-Learning Semantic HELM: A Human-Readable Memory for Reinfor
cement Learning
Math, Reasoning DeepSeekMath: Pushing the Limits of Mathematical R
easoning in Open Language Models
ArXiv, GitHub
Math, PRM Math-Shepherd: Verify and Reinforce LLMs Step-by-s
tep without Human Annotations
ArXiv
Math WizardMath: Empowering Mathematical Reasoning for
Large Language Models via Reinforced Evol-Instruct
ArXiv
Math Llemma: An Open Language Model For Mathematics ArXiv
Low-level-action, Robot SayTap: Language to Quadrupedal Locomotion ArXiv 2023/06/13
Low-level-action, Robot Prompt a Robot to Walk with Large Language Models ArXiv 2023/09/18
LoRA, Scaling LoRA: Low-Rank Adaptation of Large Language Models ArXiv
LoRA, Scaling Vera: A General-Purpose Plausibility Estimation Mo
del for Commonsense Statements
ArXiv
LoRA LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A
Technical Report
ArXiv
Lab Tencent AI Lab - AppAgent, WebVoyager
Lab DeepWisdom - MetaGPT
Lab Reworkd AI - AgentGPT
Lab OpenBMB - ChatDev, XAgent, AgentVerse
Lab XLANG NLP Lab - OpenAgents
Lab Rutgers University, AGI Research - OpenAGI
Lab Knowledge Engineering Group (KEG) & Data Mining at
Tsinghua University - CogVLM
Lab OpenGVLab GitHub
Lab Imperial College London - Zeroshot trajectory
Lab sensetime
Lab tsinghua
Lab Fudan NLP Group
Lab Penn State University
LLaVA, VLM TinyLLaVA: A Framework of Small-scale Large Multim
odal Models
ArXiv
LLaVA, MoE, VLM MoE-LLaVA: Mixture of Experts for Large Vision-Lan
guage Models
LLaMA, Lightweight, Open-source MobiLlama: Towards Accurate and Lightweight Fully
Transparent GPT
LLM, Zero-shot GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Re
cognition?
ArXiv 2023/11/27
LLM, Temporal Logics NL2TL: Transforming Natural Languages to Temporal
Logics using Large Language Models
ArXiv 2023/05/12
LLM, Survey A Survey of Large Language Models ArXiv 2023/03/31
LLM, Spacial Can Large Language Models be Good Path Planners? A
Benchmark and Investigation on Spatial-temporal Reasoning
ArXiv 2023/10/05
LLM, Scaling BitNet: Scaling 1-bit Transformers for Large Langu
age Models
ArXiv
LLM, Robot, Task-Decompose Do As I Can, Not As I Say: Grounding Language in R
obotic Affordances
ArXiv 2022/04/04
LLM, Robot, Survey Large Language Models for Robotics: A Survey ArXiv
LLM, Reasoning, Survey Towards Reasoning in Large Language Models: A Surv
ey
ArXiv 2022/12/20
LLM, Quantization The Era of 1-bit LLMs: All Large Language Models a
re in 1.58 Bits
ArXiv
LLM, PersonalCitation, Robot, Zero-shot Language Models as Zero-Shot Trajectory Generators ArXiv
LLM, PersonalCitation, Robot Tree-Planner: Efficient Close-loop Task Planning w
ith Large Language Models01
LLM, Open-source A self-hosted, offline, ChatGPT-like chatbot, powe
red by Llama 2. 100% private, with no data leaving your device.
GitHub
LLM, Open-source OpenFlamingo: An Open-Source Framework for Trainin
g Large Autoregressive Vision-Language Models
ArXiv 2023/08/02
LLM, Open-source InstructBLIP: Towards General-purpose Vision-Langu
age Models with Instruction Tuning
ArXiv 2023/05/11
LLM, Open-source ChatBridge: Bridging Modalities with Large Languag
e Model as a Language Catalyst
ArXiv 2023/05/25
LLM, Memory MemoryBank: Enhancing Large Language Models with L
ong-Term Memory
ArXiv 2023/05/17
LLM, Leaderboard LMSYS Chatbot Arena Leaderboard
LLM Language Models are Few-Shot Learners ArXiv 2020/05/28
Intaractive, OpenGVLab, VLM InternGPT: Solving Vision-Centric Tasks by Interac
ting with ChatGPT Beyond Language
ArXiv 2023/05/09
Instruction-Turning, Survey Is Prompt All You Need? No. A Comprehensive and Br
oader View of Instruction Learning
Instruction-Turning, Survey Vision-Language Instruction Tuning: A Review and A
nalysis
ArXiv
Instruction-Turning, Survey A Closer Look at the Limitations of Instruction Tu
ning
ArXiv
Instruction-Turning, Survey A Survey on Data Selection for LLM Instruction Tun
ing
ArXiv
Instruction-Turning, Self Self-Instruct: Aligning Language Models with Self-
Generated Instructions
ArXiv
Instruction-Turning, LLM, Zero-shot Finetuned Language Models Are Zero-Shot Learners ArXiv 2021/09/03
Instruction-Turning, LLM, Survey Instruction Tuning for Large Language Models: A Su
rvey
Instruction-Turning, LLM, PEFT Visual Instruction Tuning ArXiv 2023/04/17
Instruction-Turning, LLM, PEFT LLaMA-Adapter: Efficient Fine-tuning of Language M
odels with Zero-init Attention
ArXiv 2023/03/28
Instruction-Turning, LLM Training language models to follow instructions wi
th human feedback
ArXiv 2022/03/04
Instruction-Turning, LLM MiniGPT-4: Enhancing Vision-Language Understanding
with Advanced Large Language Models
ArXiv 2023/04/20
Instruction-Turning, LLM Self-Instruct: Aligning Language Models with Self-
Generated Instructions
ArXiv 2022/12/20
Instruction-Turning A Closer Look at the Limitations of Instruction Tu
ning
Instruction-Turning Exploring Format Consistency for Instruction Tunin
g
Instruction-Turning Exploring the Benefits of Training Expert Language
Models over Instruction Tuning
ArXiv 2023/02/06
Instruction-Turning Tuna: Instruction Tuning using Feedback from Large
Language Models
ArXiv 2023/03/06
In-Context-Learning, Vision What Makes Good Examples for Visual In-Context Lea
rning?
In-Context-Learning, Vision Visual Prompting via Image Inpainting ArXiv
In-Context-Learning, Video Prompting Visual-Language Models for Efficient Vid
eo Understanding
In-Context-Learning, VQA VisualCOMET: Reasoning about the Dynamic Context o
f a Still Image
ArXiv 2020/04/22
In-Context-Learning, VQA SINC: Self-Supervised In-Context Learning for Visi
on-Language Tasks
ArXiv 2023/07/15
In-Context-Learning, Survey A Survey on In-context Learning ArXiv
In-Context-Learning, Scaling Structured Prompting: Scaling In-Context Learning
to 1,000 Examples
ArXiv 2020/03/06
In-Context-Learning, Scaling Rethinking the Role of Scale for In-Context Learni
ng: An Interpretability-based Case Study at 66 Billion Scale
ArXiv 2022/03/06
In-Context-Learning, Reinforcement-Learning AMAGO: Scalable In-Context Reinforcement Learning
for Adaptive Agents
ArXiv
In-Context-Learning, Prompt-Tuning Visual Prompt Tuning ArXiv
In-Context-Learning, Perception, Vision Visual In-Context Prompting ArXiv
In-Context-Learning, Many-Shot, Reasoning Many-Shot In-Context Learning ArXiv
In-Context-Learning, Instruction-Turning In-Context Instruction Learning
In-Context-Learning ReAct: Synergizing Reasoning and Acting in Languag
e Models
ArXiv 2023/03/20
In-Context-Learning Small Models are Valuable Plug-ins for Large Langu
age Models
ArXiv 2023/05/15
In-Context-Learning Generative Agents: Interactive Simulacra of Human
Behavior
ArXiv 2023/04/07
In-Context-Learning Beyond the Imitation Game: Quantifying and extrapo
lating the capabilities of language models
ArXiv 2022/06/09
In-Context-Learning What does CLIP know about a red circle? Visual pro
mpt engineering for VLMs
ArXiv
In-Context-Learning Can large language models explore in-context? ArXiv
Image, LLaMA, Perception LLaMA-VID: An Image is Worth 2 Tokens in Large Lan
guage Models
ArXiv
Hallucination, Survey Combating Misinformation in the Age of LLMs: Oppor
tunities and Challenges
ArXiv
Gym, PPO, Reinforcement-Learning, Survey Can Language Agents Approach the Performance of RL
? An Empirical Study On OpenAI Gym
ArXiv
Grounding, Reinforcement-Learning Grounding Large Language Models in Interactive Env
ironments with Online Reinforcement Learning
ArXiv
Grounding, Reasoning Visually Grounded Reasoning across Languages and C
ultures
ArXiv
Grounding V-IRL: Grounding Virtual Intelligence in Real Life
Google, Grounding GLaMM: Pixel Grounding Large Multimodal Model ArXiv
Generation, Survey Advances in 3D Generation: A Survey
Generation, Robot, Zero-shot Zero-Shot Robotic Manipulation with Pretrained Ima
ge-Editing Diffusion Models
ArXiv
Generation, Robot, Zero-shot Towards Generalizable Zero-Shot Manipulationvia Tr
anslating Human Interaction Plans
GPT4V, Robot, VLM Closed-Loop Open-Vocabulary Mobile Manipulation wi
th GPT-4V
ArXiv
GPT4, LLM GPT-4 Technical Report ArXiv 2023/03/15
GPT4, Instruction-Turning INSTRUCTION TUNING WITH GPT-4 ArXiv
GPT4, Gemini, LLM Gemini vs GPT-4V: A Preliminary Comparison and Com
bination of Vision-Language Models Through Qualitative Cases
ArXiv 2023/12/22
Foundation, Robot, Survey Foundation Models in Robotics: Applications, Chall
enges, and the Future
ArXiv 2023/12/13
Foundation, LLaMA, Vision VisionLLaMA: A Unified LLaMA Interface for Vision
Tasks
ArXiv
Foundation, LLM, Open-source Code Llama: Open Foundation Models for Code ArXiv
Foundation, LLM, Open-source LLaMA: Open and Efficient Foundation Language Mode
ls
ArXiv 2023/02/27
Feedback, Robot Correcting Robot Plans with Natural Language Feedb
ack
ArXiv
Feedback, Robot Learning to Learn Faster from Human Feedback with
Language Model Predictive Control
ArXiv
Feedback, Robot REFLECT: Summarizing Robot Experiences for Failure
Explanation and Correction
ArXiv 2023/06/27
Feedback, In-Context-Learning, Robot InCoRo: In-Context Learning for Robotics Control w
ith Feedback Loops
ArXiv
Evaluation, LLM, Survey A Survey on Evaluation of Large Language Models ArXiv
Evaluation simple-evals GitHub
End2End, Multimodal, Robot VIMA: General Robot Manipulation with Multimodal P
rompts
ArXiv 2022/10/06
End2End, Multimodal, Robot PaLM-E: An Embodied Multimodal Language Model ArXiv 2023/03/06
End2End, Multimodal, Robot Physically Grounded Vision-Language Models for Rob
otic Manipulation
ArXiv 2023/09/05
Enbodied Embodied Question Answering ArXiv
Embodied, World-model Language Models Meet World Models: Embodied Experi
ences Enhance Language Models
Embodied, Robot, Task-Decompose Embodied Task Planning with Large Language Models ArXiv 2023/07/04
Embodied, Robot Large Language Models as Generalizable Policies fo
r Embodied Tasks
ArXiv
Embodied, Reasoning, Robot Natural Language as Policies: Reasoning for Coordi
nate-Level Embodied Control with LLMs
ArXiv, GitHub 2024/03/20
Embodied, LLM, Robot, Survey The Development of LLMs for Embodied Navigation ArXiv 2023/11/01
Driving, Spacial GPT-Driver: Learning to Drive with GPT ArXiv 2023/10/02
Drive, Survey A Survey on Multimodal Large Language Models for A
utonomous Driving
ArXiv
Distilling, Survey A Survey on Knowledge Distillation of Large Langua
ge Models
Distilling Distilling Step-by-Step! Outperforming Larger Lang
uage Models with Less Training Data and Smaller Model Sizes01
ArXiv
Diffusion, Text-to-Image Mastering Text-to-Image Diffusion: Recaptioning, P
lanning, and Generating with Multimodal LLMs
ArXiv
Diffusion, Survey On the Design Fundamentals of Diffusion Models: A
Survey
ArXiv
Diffusion, Robot 3D Diffusion Policy ArXiv
Diffusion A latent text-to-image diffusion model
Demonstration, GPT4, PersonalCitation, Robot, VLM GPT-4V(ision) for Robotics: Multimodal Task Planni
ng from Human Demonstration
Datatset, LLM, Survey A Survey on Data Selection for Language Models ArXiv
Datatset, Instruction-Turning REVO-LION: EVALUATING AND REFINING VISION LANGUAGE
INSTRUCTION TUNING DATASETS
Datatset, Instruction-Turning Synthetic Data (Almost) from Scratch: Generalized
Instruction Tuning for Language Models
Datatset PRM800K: A Process Supervision Dataset GitHub
Data-generation, Robot GenSim: Generating Robotic Simulation Tasks via La
rge Language Models
ArXiv 2023/10/02
Data-generation, Robot RoboGen: Towards Unleashing Infinite Data for Auto
mated Robot Learning via Generative Simulation
ArXiv 2023/11/02
DPO, PPO, RLHF A Comprehensive Survey of LLM Alignment Techniques
: RLHF, RLAIF, PPO, DPO and More
ArXiv
DPO Is DPO Superior to PPO for LLM Alignment? A Compre
hensive Study
ArXiv
Context-Window, Scaling Infini-gram: Scaling Unbounded n-gram Language Mod
els to a Trillion Tokens
Context-Window, Scaling LONGNET: Scaling Transformers to 1,000,000,000 Tok
ens
ArXiv 2023/07/01
Context-Window, Reasoning, RoPE, Scaling Resonance RoPE: Improving Context Length Generaliz
ation of Large Language Models
ArXiv
Context-Window, LLM, RoPE, Scaling LongRoPE: Extending LLM Context Window Beyond 2 Mi
llion Tokens
ArXiv
Context-Window, Foundation, Gemini, LLM, Scaling Gemini 1.5: Unlocking multimodal understanding acr
oss millions of tokens of context
Context-Window, Foundation Mamba: Linear-Time Sequence Modeling with Selectiv
e State Spaces
ArXiv
Context-Window RoFormer: Enhanced Transformer with Rotary Positio
n Embedding
ArXiv
Context-Awere, Context-Window DynaCon: Dynamic Robot Planner with Contextual Awa
reness via LLMs
ArXiv
Computer-Resource, Scaling FlashAttention: Fast and Memory-Efficient Exact At
tention with IO-Awareness
ArXiv
Compress, Scaling (Long)LLMLingua: Enhancing Large Language Model In
ference via Prompt Compression
ArXiv
Compress, Quantization, Survey A Survey on Model Compression for Large Language M
odels
ArXiv
Compress, Prompting Learning to Compress Prompts with Gist Tokens ArXiv
Code-as-Policies, VLM, VQA Visual Programming: Compositional visual reasoning
without training
ArXiv 2022/11/18
Code-as-Policies, Robot SMART-LLM: Smart Multi-Agent Robot Task Planning u
sing Large Language Models
ArXiv 2023/09/18
Code-as-Policies, Robot RoboScript: Code Generation for Free-Form Manipula
tion Tasks across Real and Simulation
ArXiv
Code-as-Policies, Robot Creative Robot Tool Use with Large Language Models ArXiv
Code-as-Policies, Reinforcement-Learning, Reward Code as Reward: Empowering Reinforcement Learning
with VLMs
ArXiv
Code-as-Policies, Reasoning, VLM, VQA ViperGPT: Visual Inference via Python Execution fo
r Reasoning
ArXiv 2023/03/14
Code-as-Policies, Reasoning Chain of Code: Reasoning with a Language Model-Aug
mented Code Emulator
ArXiv
Code-as-Policies, PersonalCitation, Robot, Zero-shot Socratic Models: Composing Zero-Shot Multimodal Re
asoning with Language
ArXiv 2022/04/01
Code-as-Policies, PersonalCitation, Robot, State-Manage Statler: State-Maintaining Language Models for Emb
odied Reasoning
ArXiv 2023/06/30
Code-as-Policies, PersonalCitation, Robot ProgPrompt: Generating Situated Robot Task Plans u
sing Large Language Models
ArXiv 2022/09/22
Code-as-Policies, PersonalCitation, Robot RoboCodeX:Multi-modal Code Generation forRobotic B
ehavior Synthesis
ArXiv
Code-as-Policies, PersonalCitation, Robot RoboGPT: an intelligent agent of making embodied l
ong-term decisions for daily instruction tasks
Code-as-Policies, PersonalCitation, Robot ChatGPT for Robotics: Design Principles and Model
Abilities
Code-as-Policies, Multimodal, OpenGVLab, PersonalCitation, Robot Instruct2Act: Mapping Multi-modality Instructions
to Robotic Actions with Large Language Model
ArXiv 2023/05/18
Code-as-Policies, Embodied, PersonalCitation, Robot Code as Policies: Language Model Programs for Embo
died Control
ArXiv 2022/09/16
Code-as-Policies, Embodied, PersonalCitation, Reasoning, Robot, Task-Decompose Inner Monologue: Embodied Reasoning through Planni
ng with Language Models
ArXiv
Code-LLM, Front-End Design2Code: How Far Are We From Automating Front-
End Engineering?
ArXiv
Code-LLM StarCoder 2 and The Stack v2: The Next Generation
Chain-of-Thought, Reasoning, Table Chain-of-table: Evolving tables in the reasoning c
hain for table understanding
ArXiv
Chain-of-Thought, Reasoning, Survey Towards Understanding Chain-of-Thought Prompting:
An Empirical Study of What Matters
ArXiv 2023/12/20
Chain-of-Thought, Reasoning, Survey A Survey of Chain of Thought Reasoning: Advances,
Frontiers and Future
ArXiv 2023/09/27
Chain-of-Thought, Reasoning Chain-of-Thought Prompting Elicits Reasoning in La
rge Language Models
ArXiv 2022/01/28
Chain-of-Thought, Reasoning Tree of Thoughts: Deliberate Problem Solving with
Large Language Models
ArXiv 2023/05/17
Chain-of-Thought, Reasoning Multimodal Chain-of-Thought Reasoning in Language
Models
ArXiv 2023/02/02
Chain-of-Thought, Reasoning Verify-and-Edit: A Knowledge-Enhanced Chain-of-Tho
ught Framework
ArXiv 2023/05/05
Chain-of-Thought, Reasoning Skeleton-of-Thought: Large Language Models Can Do
Parallel Decoding
ArXiv 2023/07/28
Chain-of-Thought, Reasoning Rethinking with Retrieval: Faithful Large Language
Model Inference
ArXiv 2022/12/31
Chain-of-Thought, Reasoning Self-Consistency Improves Chain of Thought Reasoni
ng in Language Models
ArXiv 2022/03/21
Chain-of-Thought, Reasoning Chain-of-Thought Hub: A Continuous Effort to Measu
re Large Language Models' Reasoning Performance
ArXiv 2023/05/26
Chain-of-Thought, Reasoning Skeleton-of-Thought: Prompting LLMs for Efficient
Parallel Generation
ArXiv
Chain-of-Thought, Prompting Chain-of-Thought Reasoning Without Prompting ArXiv
Chain-of-Thought, Planning, Reasoning SelfCheck: Using LLMs to Zero-Shot Check Their Own
Step-by-Step Reasoning
ArXiv 2023/08/01
Chain-of-Thought, In-Context-Learning, Self Measuring and Narrowing the Compositionality Gap i
n Language Models
ArXiv 2022/10/07
Chain-of-Thought, In-Context-Learning, Self Self-Polish: Enhance Reasoning in Large Language M
odels via Problem Refinement
ArXiv 2023/05/23
Chain-of-Thought, In-Context-Learning Chain-of-Table: Evolving Tables in the Reasoning C
hain for Table Understanding
ArXiv
Chain-of-Thought, In-Context-Learning Self-Refine: Iterative Refinement with Self-Feedba
ck
ArXiv 2023/03/30
Chain-of-Thought, In-Context-Learning Plan-and-Solve Prompting: Improving Zero-Shot Chai
n-of-Thought Reasoning by Large Language Models
ArXiv 2023/05/06
Chain-of-Thought, In-Context-Learning PAL: Program-aided Language Models ArXiv 2022/11/18
Chain-of-Thought, In-Context-Learning Reasoning with Language Model is Planning with Wor
ld Model
ArXiv 2023/05/24
Chain-of-Thought, In-Context-Learning Least-to-Most Prompting Enables Complex Reasoning
in Large Language Models
ArXiv 2022/05/21
Chain-of-Thought, In-Context-Learning Complexity-Based Prompting for Multi-Step Reasonin
g
ArXiv 2022/10/03
Chain-of-Thought, In-Context-Learning Maieutic Prompting: Logically Consistent Reasoning
with Recursive Explanations
ArXiv 2022/05/24
Chain-of-Thought, In-Context-Learning Algorithm of Thoughts: Enhancing Exploration of Id
eas in Large Language Models
ArXiv 2023/08/20
Chain-of-Thought, GPT4, Reasoning, Robot Look Before You Leap: Unveiling the Power ofGPT-4V
in Robotic Vision-Language Planning
ArXiv 2023/11/29
Chain-of-Thought, Embodied, Robot EgoCOT: Embodied Chain-of-Thought Dataset for Visi
on Language Pre-training
ArXiv
Chain-of-Thought, Embodied, PersonalCitation, Robot, Task-Decompose EmbodiedGPT: Vision-Language Pre-Training via Embo
died Chain of Thought
ArXiv 2023/05/24
Chain-of-Thought, Code-as-Policies, PersonalCitation, Robot Demo2Code: From Summarizing Demonstrations to Synt
hesizing Code via Extended Chain-of-Thought
ArXiv
Chain-of-Thought, Code-as-Policies Chain of Code: Reasoning with a Language Model-Aug
mented Code Emulator
ArXiv
Caption, Video PLLaVA : Parameter-free LLaVA Extension from Image
s to Videos for Video Dense Captioning
ArXiv
Caption, VLM, VQA Caption Anything: Interactive Image Description wi
th Diverse Multimodal Controls
ArXiv 2023/05/04
CRAG, RAG Corrective Retrieval Augmented Generation ArXiv
Brain, Instruction-Turning Instruction-tuning Aligns LLMs to the Human Brain ArXiv
Brain, Conscious Could a Large Language Model be Conscious?
Brain LLM-BRAIn: AI-driven Fast Generation of Robot Beha
viour Tree based on Large Language Model
ArXiv
Brain A Neuro-Mimetic Realization of the Common Model of
Cognition via Hebbian Learning and Free Energy Minimization
Benchmark, Sora, Text-to-Video LIDA: A Tool for Automatic Generation of Grammar-A
gnostic Visualizations and Infographics using Large Language Models01
Benchmark, In-Context-Learning ARB: Advanced Reasoning Benchmark for Large Langua
ge Models
ArXiv 2023/07/25
Benchmark, In-Context-Learning PlanBench: An Extensible Benchmark for Evaluating
Large Language Models on Planning and Reasoning about Change
ArXiv 2022/06/21
Benchmark, GPT4 Sparks of Artificial General Intelligence: Early e
xperiments with GPT-4
Awesome Repo, VLM awesome-vlm-architectures GitHub
Awesome Repo, Survey LLMSurvey GitHub
Awesome Repo, Robot Awesome-LLM-Robotics GitHub
Awesome Repo, Reasoning Awesome LLM Reasoning GitHub
Awesome Repo, Reasoning Awesome-Reasoning-Foundation-Models GitHub
Awesome Repo, RLHF, Reinforcement-Learning Awesome RLHF (RL with Human Feedback) GitHub
Awesome Repo, Perception, VLM Awesome Vision-Language Navigation GitHub
Awesome Repo, Package Awesome LLMOps GitHub
Awesome Repo, Multimodal Awesome-Multimodal-LLM GitHub
Awesome Repo, Multimodal Awesome-Multimodal-Large-Language-Models GitHub
Awesome Repo, Math, Science Awesome Scientific Language Models GitHub
Awesome Repo, LLM, Vision LLM-in-Vision GitHub
Awesome Repo, LLM, VLM Multimodal & Large Language Models GitHub
Awesome Repo, LLM, Survey Awesome-LLM-Survey GitHub
Awesome Repo, LLM, Robot Everything-LLMs-And-Robotics GitHub
Awesome Repo, LLM, Leaderboard LLM-Leaderboard GitHub
Awesome Repo, LLM Awesome-LLM GitHub
Awesome Repo, Korean awesome-korean-llm GitHub
Awesome Repo, Japanese, LLM 日本語LLMまとめ GitHub
Awesome Repo, In-Context-Learning Paper List for In-context Learning GitHub
Awesome Repo, IROS, Robot IROS2023PaperList GitHub
Awesome Repo, Hallucination, Survey A Survey on Hallucination in Large Language Models
: Principles, Taxonomy, Challenges, and Open Questions
ArXiv, GitHub
Awesome Repo, Embodied Awesome Embodied Vision GitHub
Awesome Repo, Diffusion Awesome-Diffusion-Models GitHub
Awesome Repo, Compress Awesome LLM Compression GitHub
Awesome Repo, Chinese Awesome-Chinese-LLM GitHub
Awesome Repo, Chain-of-Thought Chain-of-ThoughtsPapers GitHub
Automate, Prompting Large Language Models Are Human-Level Prompt Engin
eers
ArXiv 2022/11/03
Automate, Chain-of-Thought, Reasoning Automatic Chain of Thought Prompting in Large Lang
uage Models
ArXiv 2022/10/07
Audio2Video, Diffusion, Generation, Video EMO: Emote Portrait Alive - Generating Expressive
Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
ArXiv
Audio Robust Speech Recognition via Large-Scale Weak Sup
ervision
Apple, VLM MM1: Methods, Analysis & Insights from Multimodal
LLM Pre-training
ArXiv
Apple, VLM Guiding Instruction-based Image Editing via Multim
odal Large Language Models
ArXiv
Apple, Robot Large Language Models as Generalizable Policies fo
r Embodied Tasks
ArXiv
Apple, LLM, Open-source OpenELM: An Efficient Language Model Family with O
pen Training and Inference Framework
ArXiv
Apple, LLM ReALM: Reference Resolution As Language Modeling ArXiv
Apple, LLM LLM in a flash: Efficient Large Language Model Inf
erence with Limited Memory
ArXiv
Apple, In-Context-Learning, Perception SAM-CLIP: Merging Vision Foundation Models towards
Semantic and Spatial Understanding
ArXiv
Apple, Code-as-Policies, Robot Executable Code Actions Elicit Better LLM Agents ArXiv
Apple Ferret-v2: An Improved Baseline for Referring and
Grounding with Large Language Models
ArXiv
Apple Ferret-UI: Grounded Mobile UI Understanding with M
ultimodal LLMs
ArXiv
Apple Ferret: Refer and Ground Anything Anywhere at Any
Granularity
ArXiv
Anything, LLM, Open-source, Perception, Segmentation Segment Anything ArXiv 2023/04/05
Anything, Depth Depth Anything: Unleashing the Power of Large-Scal
e Unlabeled Data
ArXiv
Anything, Caption, Perception, Segmentation Segment and Caption Anything ArXiv
Anything, CLIP, Perception SAM-CLIP: Merging Vision Foundation Models towards
Semantic and Spatial Understanding
ArXiv
Agent-Project, Code-LLM open-interpreter GitHub
Agent, Web WebLINX: Real-World Website Navigation with Multi-
Turn Dialogue
ArXiv
Agent, Web WebVoyager: Building an End-to-End Web Agent with
Large Multimodal Models
ArXiv
Agent, Web OmniACT: A Dataset and Benchmark for Enabling Mult
imodal Generalist Autonomous Agents for Desktop and Web
ArXiv
Agent, Web OS-Copilot: Towards Generalist Computer Agents wit
h Self-Improvement
ArXiv
Agent, Video-for-Agent Video as the New Language for Real-World Decision
Making
Agent, VLM AssistGPT: A General Multi-modal Assistant that ca
n Plan, Execute, Inspect, and Learn
ArXiv
Agent, Tool Gorilla: Large Language Model Connected with Massi
ve APIs
ArXiv
Agent, Tool ToolLLM: Facilitating Large Language Models to Mas
ter 16000+ Real-world APIs
ArXiv
Agent, Survey A Survey on Large Language Model based Autonomous
Agents
ArXiv 2023/08/22
Agent, Survey The Rise and Potential of Large Language Model Bas
ed Agents: A Survey
ArXiv 2023/09/14
Agent, Survey Agent AI: Surveying the Horizons of Multimodal Int
eraction
ArXiv
Agent, Survey Large Multimodal Agents: A Survey ArXiv
Agent, Soft-Dev Communicative Agents for Software Development GitHub
Agent, Soft-Dev MetaGPT: Meta Programming for A Multi-Agent Collab
orative Framework
ArXiv
Agent, Robot, Survey A Survey on LLM-based Autonomous Agents GitHub
Agent, Reinforcement-Learning, Reward Reward Design with Language Models ArXiv 2023/02/27
Agent, Reinforcement-Learning, Reward EAGER: Asking and Answering Questions for Automati
c Reward Shaping in Language-guided RL
ArXiv 2022/06/20
Agent, Reinforcement-Learning, Reward Text2Reward: Automated Dense Reward Function Gener
ation for Reinforcement Learning
ArXiv 2023/09/20
Agent, Reinforcement-Learning Eureka: Human-Level Reward Design via Coding Large
Language Models
ArXiv 2023/10/19
Agent, Reinforcement-Learning Language to Rewards for Robotic Skill Synthesis ArXiv 2023/06/14
Agent, Reinforcement-Learning Language Instructed Reinforcement Learning for Hum
an-AI Coordination
ArXiv 2023/04/13
Agent, Reinforcement-Learning Guiding Pretraining in Reinforcement Learning with
Large Language Models
ArXiv 2023/02/13
Agent, Reinforcement-Learning STARLING: SELF-SUPERVISED TRAINING OF TEXTBASED RE
INFORCEMENT LEARNING AGENT WITH LARGE LANGUAGE MODELS
Agent, Reasoning, Zero-shot Agent Instructs Large Language Models to be Genera
l Zero-Shot Reasoners
ArXiv 2023/10/05
Agent, Reasoning Pangu-Agent: A Fine-Tunable Generalist Agent with
Structured Reasoning
ArXiv
Agent, Reasoning AGENT INSTRUCTS LARGE LANGUAGE MODELS TO BE GENERA
L ZERO-SHOT REASONERS
ArXiv
Agent, Multimodal, Robot A Generalist Agent ArXiv 2022/05/12
Agent, Multi War and Peace (WarAgent): Large Language Model-bas
ed Multi-Agent Simulation of World Wars
ArXiv
Agent, MobileApp You Only Look at Screens: Multimodal Chain-of-Acti
on Agents
ArXiv, GitHub
Agent, Minecraft, Reinforcement-Learning RLAdapter: Bridging Large Language Models to Reinf
orcement Learning in Open Worlds
Agent, Minecraft Voyager: An Open-Ended Embodied Agent with Large L
anguage Models
ArXiv 2023/05/25
Agent, Minecraft Describe, Explain, Plan and Select: Interactive Pl
anning with Large Language Models Enables Open-World Multi-Task Agents
ArXiv 2023/02/03
Agent, Minecraft LARP: Language-Agent Role Play for Open-World Game
s
ArXiv
Agent, Minecraft Steve-Eye: Equipping LLM-based Embodied Agents wit
h Visual Perception in Open Worlds
ArXiv
Agent, Minecraft S-Agents: Self-organizing Agents in Open-ended Env
ironment
ArXiv
Agent, Minecraft Ghost in the Minecraft: Generally Capable Agents f
or Open-World Environments via Large Language Models with Text-based Knowledge and Memory01
ArXiv
Agent, Memory, RAG, Robot RAP: Retrieval-Augmented Planning with Contextual
Memory for Multimodal LLM Agents
ArXiv 2024/02/06
Agent, Memory, Minecraft JARVIS-1: Open-World Multi-task Agents with Memory
-Augmented Multimodal Language Models
ArXiv 2023/11/10
Agent, LLM, Planning LLM-Planner: Few-Shot Grounded Planning for Embodi
ed Agents with Large Language Models
ArXiv
Agent, Instruction-Turning AgentTuning: Enabling Generalized Agent Abilities
For LLMs
ArXiv
Agent, Game LEARNING EMBODIED VISION-LANGUAGE PRO- GRAMMING FR
OM INSTRUCTION, EXPLORATION, AND ENVIRONMENTAL FEEDBACK
ArXiv
Agent, GUI, Web "What’s important here?": Opportunities and Challe
nges of Using LLMs in Retrieving Informatio from Web Interfaces
ArXiv
Agent, GUI, MobileApp Mobile-Agent: Autonomous Multi-Modal Mobile Device
Agent with Visual Perception
Agent, GUI, MobileApp AppAgent: Multimodal Agents as Smartphone Users ArXiv
Agent, GUI, MobileApp You Only Look at Screens: Multimodal Chain-of-Acti
on Agents
Agent, GUI CogAgent: A Visual Language Model for GUI Agents ArXiv
Agent, GUI ScreenAgent: A Computer Control Agent Driven by Vi
sual Language Large Model
GitHub
Agent, GUI SeeClick: Harnessing GUI Grounding for Advanced Vi
sual GUI Agents
ArXiv
Agent, GPT4, Web GPT-4V(ision) is a Generalist Web Agent, if Ground
ed
ArXiv
Agent, Feedback, Reinforcement-Learning, Robot Accelerating Reinforcement Learning of Robotic Man
ipulations via Feedback from Large Language Models
ArXiv 2023/11/04
Agent, Feedback, Reinforcement-Learning AdaRefiner: Refining Decisions of Language Models
with Adaptive Feedback
ArXiv 2023/09/29
Agent, End2End, Game, Robot An Interactive Agent Foundation Model ArXiv
Agent, Embodied, Survey Application of Pretrained Large Language Models in
Embodied Artificial Intelligence
ArXiv
Agent, Embodied, Robot AutoRT: Embodied Foundation Models for Large Scale
Orchestration of Robotic Agents
ArXiv
Agent, Embodied, Robot OPEx: A Component-Wise Analysis of LLM-Centric Age
nts in Embodied Instruction Following
ArXiv
Agent, Embodied OpenAgents: An Open Platform for Language Agents i
n the Wild
ArXiv, GitHub
Agent, Embodied LLM-Planner: Few-Shot Grounded Planning for Embodi
ed Agents with Large Language Models
ArXiv
Agent, Embodied Embodied Multi-Modal Agent trained by an LLM from
a Parallel TextWorld
ArXiv
Agent, Embodied Octopus: Embodied Vision-Language Programmer from
Environmental Feedback
Agent, Embodied Embodied Task Planning with Large Language Models ArXiv
Agent, Diffusion, Speech NaturalSpeech 3: Zero-Shot Speech Synthesis with F
actorized Codec and Diffusion Models
ArXiv
Agent, Code-as-Policies Executable Code Actions Elicit Better LLM Agents ArXiv 2024/01/24
Agent, Code-LLM, Code-as-Policies, Survey If LLM Is the Wizard, Then Code Is the Wand: A Sur
vey on How Code Empowers Large Language Models to Serve as Intelligent Agents
ArXiv
Agent, Code-LLM TaskWeaver: A Code-First Agent Framework
Agent, Blog LLM Powered Autonomous Agents ArXiv
Agent, Awesome Repo, LLM Awesome-Embodied-Agent-with-LLMs GitHub
Agent, Awesome Repo, LLM CoALA: Awesome Language Agents ArXiv, GitHub
Agent, Awesome Repo, Embodied, Grounding XLang Paper Reading GitHub
Agent, Awesome Repo Awesome AI Agents GitHub
Agent, Awesome Repo Autonomous Agents GitHub
Agent, Awesome Repo Awesome-Papers-Autonomous-Agent GitHub
Agent, Awesome Repo Awesome Large Multimodal Agents GitHub
Agent, Awesome Repo LLM Agents Papers GitHub
Agent, Awesome Repo Awesome LLM-Powered Agent GitHub
Agent XAgent: An Autonomous Agent for Complex Task Solvi
ng
Agent LLM-Powered Hierarchical Language Agent for Real-t
ime Human-AI Coordination
ArXiv
Agent AgentVerse: Facilitating Multi-Agent Collaboration
and Exploring Emergent Behaviors
ArXiv
Agent Agents: An Open-source Framework for Autonomous La
nguage Agents
ArXiv, GitHub
Agent AutoAgents: A Framework for Automatic Agent Genera
tion
GitHub
Agent DSPy: Compiling Declarative Language Model Calls i
nto Self-Improving Pipelines
ArXiv
Agent AutoGen: Enabling Next-Gen LLM Applications via Mu
lti-Agent Conversation
ArXiv
Agent CAMEL: Communicative Agents for “Mind” Exploration
of Large Language Model Society
ArXiv
Agent XAgent: An Autonomous Agent for Complex Task Solvi
ng
ArXiv
Agent Generative Agents: Interactive Simulacra of Human
Behavior
ArXiv
Agent LLM+P: Empowering Large Language Models with Optim
al Planning Proficiency
ArXiv 2023/04/22
Agent AgentSims: An Open-Source Sandbox for Large Langua
ge Model Evaluation
ArXiv 2023/08/08
Agent Agents: An Open-source Framework for Autonomous La
nguage Agents
ArXiv
Agent MindAgent: Emergent Gaming Interaction ArXiv
Agent InfiAgent: A Multi-Tool Agent for AI Operating Sys
tems
Agent Predictive Minds: LLMs As Atypical Active Inferenc
e Agents
Agent swarms GitHub
Agent ScreenAgent: A Vision Language Model-driven Comput
er Control Agent
ArXiv
Agent AssistGPT: A General Multi-modal Assistant that ca
n Plan, Execute, Inspect, and Learn
ArXiv
Agent PromptAgent: Strategic Planning with Language Mode
ls Enables Expert-level Prompt Optimization
ArXiv
Agent Cognitive Architectures for Language Agents ArXiv
Agent AIOS: LLM Agent Operating System ArXiv
Agent LLM as OS, Agents as Apps: Envisioning AIOS, Agent
s and the AIOS-Agent Ecosystem
ArXiv
Agent Towards General Computer Control: A Multimodal Age
nt for Red Dead Redemption II as a Case Study
Affordance, Segmentation ManipVQA: Injecting Robotic Affordance and Physica
lly Grounded Information into Multi-Modal Large Language Models
ArXiv
Action-Model, Agent, LAM LaVague GitHub
Action-Generation, Generation, Prompting Prompt a Robot to Walk with Large Language Models ArXiv
APIs, Agent, Tool Gorilla: Large Language Model Connected with Massi
ve APIs
ArXiv
AGI, Survey Levels of AGI: Operationalizing Progress on the Pa
th to AGI
ArXiv
AGI, Brain When Brain-inspired AI Meets AGI ArXiv
AGI, Brain Divergences between Language Models and Human Brai
ns
ArXiv
AGI, Awesome Repo, Survey Awesome-LLM-Papers-Toward-AGI GitHub
AGI, Agent OpenAGI: When LLM Meets Domain Experts
3D, Open-source, Perception, Robot 3D-LLM: Injecting the 3D World into Large Language
Models
ArXiv 2023/07/24
3D, GPT4, VLM GPT-4V(ision) is a Human-Aligned Evaluator for Tex
t-to-3D Generation
ArXiv
ChatEval: Towards Better LLM-based Evaluators thro
ugh Multi-Agent Debate
ArXiv 2023/08/14