We provide awesome papers and repos on very comprehensive topics as follows.
CoT / VLM / Quantization / Grounding / Text2IMG&VID / Prompt Engineering / Prompt Tuning / Reasoning / Robot / Agent / Planning / Reinforcement-Learning / Feedback / In-Context-Learning / Few-Shot / Zero-Shot / Instruction Tuning / PEFT / RLHF / RAG / Embodied / VQA / Hallucination / Diffusion / Scaling / Context-Window / WorldModel / Memory / Zero-Shot / RoPE / Speech / Perception / Survey / Segmentation / Learge Action Model / Foundation / RoPE / LoRA / PPO / DPO
We strongly recommend checking our Notion table for an interactive experience.
Category | Title | Links | Date |
---|---|---|---|
Zero-shot | Can Foundation Models Perform Zero-Shot Task Speci fication For Robot Manipulation? |
||
World-model | Leveraging Pre-trained Large Language Models to Co nstruct and Utilize World Models for Model-based Task Planning |
ArXiv | 2023/05/07 |
World-model | Learning and Leveraging World Models in Visual Rep resentation Learning |
||
World-model | Language Models Meet World Models | ArXiv | |
World-model | Learning to Model the World with Language | ArXiv | |
World-model | Diffusion World Model | ArXiv | |
World-model | Learning to Model the World with Language | ArXiv | |
VisualPrompt | Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models |
ArXiv | |
VisualPrompt | Making Large Multimodal Models Understand Arbitrar y Visual Prompts |
ArXiv | |
VisualPrompt | What does CLIP know about a red circle? Visual pro mpt engineering for VLMs |
ArXiv | |
VisualPrompt | MOKA: Open-Vocabulary Robotic Manipulation through Mark-Based Visual Prompting |
ArXiv | |
VisualPrompt | SoM : Set-of-Mark PromptingUnleashes Extraordinary Visual Grounding in GPT-4V |
ArXiv | |
VisualPrompt | Set-of-Mark Prompting Unleashes Extraordinary Visu al Grounding in GPT-4V |
ArXiv, GitHub | |
Video | MA-LMM: Memory-Augmented Large Multimodal Model fo r Long-Term Video Understanding |
ArXiv | |
ViFM, Video | InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding |
ArXiv, GitHub | |
VLM, World-model | Large World Model | ArXiv | |
VLM, VQA | CogVLM: Visual Expert for Pretrained Language Mode ls |
ArXiv | 2023/11/06 |
VLM, VQA | Chameleon: Plug-and-Play Compositional Reasoning w ith Large Language Models |
ArXiv | 2023/04/19 |
VLM, VQA | DeepSeek-VL: Towards Real-World Vision-Language Un derstanding01 |
||
VLM | PaLM: Scaling Language Modeling with Pathways | ArXiv | 2022/04/05 |
VLM | ScreenAI: A Vision-Language Model for UI and Infog raphics Understanding |
||
VLM | MoE-LLaVA: Mixture of Experts for Large Vision-Lan guage Models |
ArXiv, GitHub | |
VLM | LLaVA-NeXT: Improved reasoning, OCR, and world kno wledge |
GitHub | |
VLM | Mini-Gemini: Mining the Potential of Multi-modalit y Vision Language Models |
ArXiv | |
Text-to-Image, World-model | World Model on Million-Length Video And Language W ith RingAttention |
ArXiv | |
Tex2Img | Be Yourself: Bounded Attention for Multi-Subject T ext-to-Image Generation |
ArXiv | |
Temporal | Explorative Inbetweening of Time and Space | ArXiv | |
Survey, Video | Video Understanding with Large Language Models: A Survey |
ArXiv | |
Survey, VLM | MM-LLMs: Recent Advances in MultiModal Large Langu age Models |
||
Survey, Training | Understanding LLMs: A Comprehensive Overview from Training to Inference |
ArXiv | |
Survey, TimeSeries | Large Models for Time Series and Spatio-Temporal D ata: A Survey and Outlook |
||
Survey | Efficient Large Language Models: A Survey | ArXiv, GitHub | |
Sora, Text-to-Video | Sora: A Review on Background, Technology, Limitati ons, and Opportunities of Large Vision Models |
||
Sora, Text-to-Video | Mora: Enabling Generalist Video Generation via A M ulti-Agent Framework |
ArXiv | |
Segmentation | LISA: Reasoning Segmentation via Large Language Mo del |
ArXiv | |
Segmentation | GRES: Generalized Referring Expression Segmentatio n |
||
Segmentation | Generalized Decoding for Pixel, Image, and Languag e |
ArXiv | |
Segmentation | SEEM: Segment Everything Everywhere All at Once | ArXiv, GitHub | |
Segmentation | SegGPT: Segmenting Everything In Context | ArXiv | |
Segmentation | Grounded SAM: Assembling Open-World Models for Div erse Visual Tasks |
ArXiv | |
Scaling | Leave No Context Behind: Efficient Infinite Contex t Transformers with Infini-attention |
ArXiv | |
SLM, Scaling | Textbooks Are All You Need | ArXiv | |
Robot, Zero-shot | BC-Z: Zero-Shot Task Generalization with Robotic I mitation Learning |
ArXiv | |
Robot, Zero-shot | Universal Manipulation Interface: In-The-Wild Robo t Teaching Without In-The-Wild Robots |
ArXiv | |
Robot, Zero-shot | Mirage: Cross-Embodiment Zero-Shot Policy Transfer with Cross-Painting |
ArXiv | |
Robot, Task-Decompose, Zero-shot | Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents |
ArXiv | 2022/01/18 |
Robot, Task-Decompose | SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning |
ArXiv | 2023/07/12 |
Robot, TAMP | LLM3:Large Language Model-based Task and Motion Pl anning with Motion Failure Reasoning |
ArXiv | |
Robot, TAMP | Task and Motion Planning with Large Language Model s for Object Rearrangement |
ArXiv | |
Robot, Survey | Toward General-Purpose Robots via Foundation Model s: A Survey and Meta-Analysis |
ArXiv | 2023/12/14 |
Robot, Survey | Language-conditioned Learning for Robotic Manipula tion: A Survey |
ArXiv | 2023/12/17 |
Robot, Survey | Robot Learning in the Era of Foundation Models: A Survey |
ArXiv | 2023/11/24 |
Robot, Survey | Real-World Robot Applications of Foundation Models : A Review |
ArXiv | |
Robot | OK-Robot: What Really Matters in Integrating Open- Knowledge Models for Robotics |
ArXiv | |
Robot | RoCo: Dialectic Multi-Robot Collaboration with Lar ge Language Models |
ArXiv | |
Robot | Interactive Language: Talking to Robots in Real Ti me |
ArXiv | |
Robot | Reflexion: Language Agents with Verbal Reinforceme nt Learning |
ArXiv | 2023/03/20 |
Robot | Generative Expressive Robot Behaviors using Large Language Models |
ArXiv | |
Robot | RoboCat: A Self-Improving Generalist Agent for Rob otic Manipulation |
||
Robot | Introspective Tips: Large Language Model for In-Co ntext Decision Making |
ArXiv | |
Robot | PIVOT: Iterative Visual Prompting Elicits Actionab le Knowledge for VLMs |
ArXiv | |
Robot | OCI-Robotics: Object-Centric Instruction Augmentat ion for Robotic Manipulation |
ArXiv | |
Robot | DeliGrasp: Inferring Object Mass, Friction, and Co mpliance with LLMs for Adaptive and Minimally Deforming Grasp Policies |
ArXiv | |
Robot | VoxPoser: Composable 3D Value Maps for Robotic Man ipulation with Language Models |
ArXiv | |
Robot | Creative Robot Tool Use with Large Language Models | ArXiv | |
Robot | AutoTAMP: Autoregressive Task and Motion Planning with LLMs as Translators and Checkers |
ArXiv | |
RoPE | RoFormer: Enhanced Transformer with Rotary Positio n Embedding |
ArXiv | |
Resource | [Resource] Paperswithcode | ArXiv | |
Resource | [Resource] huggingface | ArXiv | |
Resource | [Resource] dailyarxiv | ArXiv | |
Resource | [Resource] Connectedpapers | ArXiv | |
Resource | [Resource] Semanticscholar | ArXiv | |
Resource | [Resource] AlphaSignal | ArXiv | |
Resource | [Resource] arxiv-sanity | ArXiv | |
Reinforcement-Learning, VIMA | FoMo Rewards: Can we cast foundation models as rew ard functions? |
ArXiv | |
Reinforcement-Learning, Robot | Towards A Unified Agent with Foundation Models | ArXiv | |
Reinforcement-Learning | Large Language Models Are Semi-Parametric Reinforc ement Learning Agents |
ArXiv | |
Reinforcement-Learning | RLang: A Declarative Language for Describing Parti al World Knowledge to Reinforcement Learning Agents |
ArXiv | |
Reasoning, Zero-shot | Large Language Models are Zero-Shot Reasoners | ArXiv | |
Reasoning, VLM, VQA | MM-REACT: Prompting ChatGPT for Multimodal Reasoni ng and Action |
ArXiv | 2023/03/20 |
Reasoning, Table | Large Language Models are few(1)-shot Table Reason ers |
ArXiv | |
Reasoning, Symbolic | Symbol-LLM: Leverage Language Models for Symbolic System in Visual Human Activity Reasoning |
ArXiv | |
Reasoning, Survey | Reasoning with Language Model Prompting: A Survey | ArXiv | |
Reasoning, Robot | AlphaBlock: Embodied Finetuning for Vision-Languag e Reasoning in Robot Manipulation |
ArXiv | |
Reasoning, Reward | LET’S REWARD STEP BY STEP: STEP-LEVEL REWARD MODEL AS THE NAVIGATORS FOR REASONING |
ArXiv | |
Reasoning, Reinforcement-Learning | ReFT: Reasoning with Reinforced Fine-Tuning | ||
Reasoning | Selection-Inference: Exploiting Large Language Mod els for Interpretable Logical Reasoning |
ArXiv | |
Reasoning | ReConcile: Round-Table Conference Improves Reasoni ng via Consensus among Diverse LLMs. |
ArXiv | |
Reasoning | Self-Discover: Large Language Models Self-Compose Reasoning Structures |
ArXiv | |
Reasoning | Chain-of-Thought Reasoning Without Prompting | ArXiv | |
Reasoning | Contrastive Chain-of-Thought Prompting | ArXiv | |
Reasoning | Rephrase and Respond(RaR) | ||
Reasoning | Take a Step Back: Evoking Reasoning via Abstractio n in Large Language Models |
ArXiv | |
Reasoning | STaR: Bootstrapping Reasoning With Reasoning | ArXiv | 2022/05/28 |
Reasoning | The Impact of Reasoning Step Length on Large Langu age Models |
ArXiv | |
Reasoning | Beyond Natural Language: LLMs Leveraging Alternati ve Formats for Enhanced Reasoning and Communication |
ArXiv | |
Reasoning | Large Language Models as General Pattern Machines | ArXiv | |
RLHF, Reinforcement-Learning, Survey | A Survey of Reinforcement Learning from Human Feed back |
||
RLHF | Secrets of RLHF in Large Language Models Part II: Reward Modeling |
ArXiv | |
RAG, Temporal Logics | FreshLLMs: Refreshing Large Language Models with S earch Engine Augmentation |
ArXiv | |
RAG, Survey | Large Language Models for Information Retrieval: A Survey |
||
RAG, Survey | Retrieval-Augmented Generation for Large Language | ||
RAG, Survey | Retrieval-Augmented Generation for Large Language Models: A Survey |
ArXiv | |
RAG | Training Language Models with Memory Augmentation | ||
RAG | Self-RAG: Learning to Retrieve, Generate, and Crit ique through Self-Reflection |
||
RAG | RAG-Fusion: a New Take on Retrieval-Augmented Gene ration |
ArXiv | |
RAG | RAFT: Adapting Language Model to Domain Specific R AG |
ArXiv | |
RAG | Adaptive-RAG: Learning to Adapt Retrieval-Augmente d Large Language Models through Question Complexity |
ArXiv | |
RAG | RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Ca se Study on Agriculture |
ArXiv | |
RAG | Fine-Tuning or Retrieval? Comparing Knowledge Inje ction in LLMs |
ArXiv | |
Quantization, Scaling | SliceGPT: Compress Large Language Models by Deleti ng Rows and Columns |
ArXiv | |
Prompting, Survey | A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications |
ArXiv | |
Prompting, Robot, Zero-shot | Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning |
ArXiv | |
Prompting | Contrastive Chain-of-Thought Prompting | ArXiv | |
PersonalCitation, Robot | Text2Motion: From Natural Language Instructions to Feasible Plans |
ArXiv | |
Perception, Video, Vision | CLIP4Clip: An Empirical Study of CLIP for End to E nd Video Clip Retrieval |
ArXiv | |
Perception, Task-Decompose | DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment |
ArXiv | 2023/07/01 |
Perception, Robot, Segmentation | Language Segment-Anything | ||
Perception, Robot | LiDAR-LLM: Exploring the Potential of Large Langua ge Models for 3D LiDAR Understanding |
ArXiv | 2023/12/21 |
Perception, Reasoning, Robot | Reasoning Grasping via Multimodal Large Language M odel |
ArXiv | |
Perception, Reasoning | DetGPT: Detect What You Need via Reasoning | ArXiv | |
Perception, Reasoning | Lenna: Language Enhanced Reasoning Detection Assis tant |
ArXiv | |
Perception | Simple Open-Vocabulary Object Detection with Visio n Transformers |
ArXiv | 2022/05/12 |
Perception | Grounded Language-Image Pre-training | ArXiv | 2021/12/07 |
Perception | Grounding DINO: Marrying DINO with Grounded Pre-Tr aining for Open-Set Object Detection |
ArXiv | 2023/03/09 |
Perception | PointCLIP: Point Cloud Understanding by CLIP | ArXiv | 2021/12/04 |
Perception | DINO: DETR with Improved DeNoising Anchor Boxes fo r End-to-End Object Detection |
ArXiv | |
Perception | Recognize Anything: A Strong Image Tagging Model | ArXiv | |
Perception | Simple Open-Vocabulary Object Detection with Visio n Transformers |
ArXiv | |
Perception | Sigmoid Loss for Language Image Pre-Training | ArXiv | |
Package | LlamaIndex | GitHub | |
Package | LangChain | GitHub | |
Package | h2oGPT | GitHub | |
Package | Dify | GitHub | |
Package | Alpaca-LoRA | GitHub | |
Package | Promptlayer | GitHub | |
Package | unsloth | GitHub | |
Package | Instructor: Structured LLM Outputs | GitHub | |
PRM | Let's Verify Step by Step | ArXiv | |
PRM | Let's reward step by step: Step-Level reward model as the Navigators for Reasoning |
ArXiv | |
PPO, RLHF, Reinforcement-Learning | Secrets of RLHF in Large Language Models Part I: P PO |
ArXiv | 2024/02/01 |
Open-source, VLM | OpenFlamingo: An Open-Source Framework for Trainin g Large Autoregressive Vision-Language Models |
ArXiv | 2023/08/02 |
Open-source, SLM | RecurrentGemma: Moving Past Transformers for Effic ient Open Language Models |
ArXiv | |
Open-source, Perception | Grounding DINO: Marrying DINO with Grounded Pre-Tr aining for Open-Set Object Detection |
ArXiv | |
Open-source | Gemma: Introducing new state-of-the-art open model s |
ArXiv | |
Open-source | Mistral 7B | ArXiv | |
Open-source | Qwen Technical Report | ArXiv | |
Navigation, Reasoning, Vision | NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models |
ArXiv | |
Natural-Language-as-Polices, Robot | RT-H: Action Hierarchies Using Language | ArXiv | |
Multimodal, Robot, VLM | Open-World Object Manipulation using Pre-trained V ision-Language Models |
ArXiv | 2023/03/02 |
Multimodal, Robot | MOMA-Force: Visual-Force Imitation for Real-World Mobile Manipulation |
ArXiv | 2023/08/07 |
Multimodal, Robot | Flamingo: a Visual Language Model for Few-Shot Lea rning |
ArXiv | 2022/04/29 |
Multi-Images, VLM | Mantis: Multi-Image Instruction Tuning | ArXiv, GitHub | |
MoE | Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity |
ArXiv | |
MoE | Sparse MoE as the New Dropout: Scaling Dense and S elf-Slimmable Transformers |
ArXiv | |
Mixtral, MoE | Mixtral of Experts | ArXiv | |
Memory, Robot | LLM as A Robotic Brain: Unifying Egocentric Memory and Control |
ArXiv | 2023/04/19 |
Memory, Reinforcement-Learning | Semantic HELM: A Human-Readable Memory for Reinfor cement Learning |
||
Math, Reasoning | DeepSeekMath: Pushing the Limits of Mathematical R easoning in Open Language Models |
ArXiv, GitHub | |
Math, PRM | Math-Shepherd: Verify and Reinforce LLMs Step-by-s tep without Human Annotations |
ArXiv | |
Math | WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct |
ArXiv | |
Math | Llemma: An Open Language Model For Mathematics | ArXiv | |
Low-level-action, Robot | SayTap: Language to Quadrupedal Locomotion | ArXiv | 2023/06/13 |
Low-level-action, Robot | Prompt a Robot to Walk with Large Language Models | ArXiv | 2023/09/18 |
LoRA, Scaling | LoRA: Low-Rank Adaptation of Large Language Models | ArXiv | |
LoRA, Scaling | Vera: A General-Purpose Plausibility Estimation Mo del for Commonsense Statements |
ArXiv | |
LoRA | LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report |
ArXiv | |
Lab | Tencent AI Lab - AppAgent, WebVoyager | ||
Lab | DeepWisdom - MetaGPT | ||
Lab | Reworkd AI - AgentGPT | ||
Lab | OpenBMB - ChatDev, XAgent, AgentVerse | ||
Lab | XLANG NLP Lab - OpenAgents | ||
Lab | Rutgers University, AGI Research - OpenAGI | ||
Lab | Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University - CogVLM |
||
Lab | OpenGVLab | GitHub | |
Lab | Imperial College London - Zeroshot trajectory | ||
Lab | sensetime | ||
Lab | tsinghua | ||
Lab | Fudan NLP Group | ||
Lab | Penn State University | ||
LLaVA, VLM | TinyLLaVA: A Framework of Small-scale Large Multim odal Models |
ArXiv | |
LLaVA, MoE, VLM | MoE-LLaVA: Mixture of Experts for Large Vision-Lan guage Models |
||
LLaMA, Lightweight, Open-source | MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT |
||
LLM, Zero-shot | GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Re cognition? |
ArXiv | 2023/11/27 |
LLM, Temporal Logics | NL2TL: Transforming Natural Languages to Temporal Logics using Large Language Models |
ArXiv | 2023/05/12 |
LLM, Survey | A Survey of Large Language Models | ArXiv | 2023/03/31 |
LLM, Spacial | Can Large Language Models be Good Path Planners? A Benchmark and Investigation on Spatial-temporal Reasoning |
ArXiv | 2023/10/05 |
LLM, Scaling | BitNet: Scaling 1-bit Transformers for Large Langu age Models |
ArXiv | |
LLM, Robot, Task-Decompose | Do As I Can, Not As I Say: Grounding Language in R obotic Affordances |
ArXiv | 2022/04/04 |
LLM, Robot, Survey | Large Language Models for Robotics: A Survey | ArXiv | |
LLM, Reasoning, Survey | Towards Reasoning in Large Language Models: A Surv ey |
ArXiv | 2022/12/20 |
LLM, Quantization | The Era of 1-bit LLMs: All Large Language Models a re in 1.58 Bits |
ArXiv | |
LLM, PersonalCitation, Robot, Zero-shot | Language Models as Zero-Shot Trajectory Generators | ArXiv | |
LLM, PersonalCitation, Robot | Tree-Planner: Efficient Close-loop Task Planning w ith Large Language Models01 |
||
LLM, Open-source | A self-hosted, offline, ChatGPT-like chatbot, powe red by Llama 2. 100% private, with no data leaving your device. |
GitHub | |
LLM, Open-source | OpenFlamingo: An Open-Source Framework for Trainin g Large Autoregressive Vision-Language Models |
ArXiv | 2023/08/02 |
LLM, Open-source | InstructBLIP: Towards General-purpose Vision-Langu age Models with Instruction Tuning |
ArXiv | 2023/05/11 |
LLM, Open-source | ChatBridge: Bridging Modalities with Large Languag e Model as a Language Catalyst |
ArXiv | 2023/05/25 |
LLM, Memory | MemoryBank: Enhancing Large Language Models with L ong-Term Memory |
ArXiv | 2023/05/17 |
LLM, Leaderboard | LMSYS Chatbot Arena Leaderboard | ||
LLM | Language Models are Few-Shot Learners | ArXiv | 2020/05/28 |
Intaractive, OpenGVLab, VLM | InternGPT: Solving Vision-Centric Tasks by Interac ting with ChatGPT Beyond Language |
ArXiv | 2023/05/09 |
Instruction-Turning, Survey | Is Prompt All You Need? No. A Comprehensive and Br oader View of Instruction Learning |
||
Instruction-Turning, Survey | Vision-Language Instruction Tuning: A Review and A nalysis |
ArXiv | |
Instruction-Turning, Survey | A Closer Look at the Limitations of Instruction Tu ning |
ArXiv | |
Instruction-Turning, Survey | A Survey on Data Selection for LLM Instruction Tun ing |
ArXiv | |
Instruction-Turning, Self | Self-Instruct: Aligning Language Models with Self- Generated Instructions |
ArXiv | |
Instruction-Turning, LLM, Zero-shot | Finetuned Language Models Are Zero-Shot Learners | ArXiv | 2021/09/03 |
Instruction-Turning, LLM, Survey | Instruction Tuning for Large Language Models: A Su rvey |
||
Instruction-Turning, LLM, PEFT | Visual Instruction Tuning | ArXiv | 2023/04/17 |
Instruction-Turning, LLM, PEFT | LLaMA-Adapter: Efficient Fine-tuning of Language M odels with Zero-init Attention |
ArXiv | 2023/03/28 |
Instruction-Turning, LLM | Training language models to follow instructions wi th human feedback |
ArXiv | 2022/03/04 |
Instruction-Turning, LLM | MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models |
ArXiv | 2023/04/20 |
Instruction-Turning, LLM | Self-Instruct: Aligning Language Models with Self- Generated Instructions |
ArXiv | 2022/12/20 |
Instruction-Turning | A Closer Look at the Limitations of Instruction Tu ning |
||
Instruction-Turning | Exploring Format Consistency for Instruction Tunin g |
||
Instruction-Turning | Exploring the Benefits of Training Expert Language Models over Instruction Tuning |
ArXiv | 2023/02/06 |
Instruction-Turning | Tuna: Instruction Tuning using Feedback from Large Language Models |
ArXiv | 2023/03/06 |
In-Context-Learning, Vision | What Makes Good Examples for Visual In-Context Lea rning? |
||
In-Context-Learning, Vision | Visual Prompting via Image Inpainting | ArXiv | |
In-Context-Learning, Video | Prompting Visual-Language Models for Efficient Vid eo Understanding |
||
In-Context-Learning, VQA | VisualCOMET: Reasoning about the Dynamic Context o f a Still Image |
ArXiv | 2020/04/22 |
In-Context-Learning, VQA | SINC: Self-Supervised In-Context Learning for Visi on-Language Tasks |
ArXiv | 2023/07/15 |
In-Context-Learning, Survey | A Survey on In-context Learning | ArXiv | |
In-Context-Learning, Scaling | Structured Prompting: Scaling In-Context Learning to 1,000 Examples |
ArXiv | 2020/03/06 |
In-Context-Learning, Scaling | Rethinking the Role of Scale for In-Context Learni ng: An Interpretability-based Case Study at 66 Billion Scale |
ArXiv | 2022/03/06 |
In-Context-Learning, Reinforcement-Learning | AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents |
ArXiv | |
In-Context-Learning, Prompt-Tuning | Visual Prompt Tuning | ArXiv | |
In-Context-Learning, Perception, Vision | Visual In-Context Prompting | ArXiv | |
In-Context-Learning, Many-Shot, Reasoning | Many-Shot In-Context Learning | ArXiv | |
In-Context-Learning, Instruction-Turning | In-Context Instruction Learning | ||
In-Context-Learning | ReAct: Synergizing Reasoning and Acting in Languag e Models |
ArXiv | 2023/03/20 |
In-Context-Learning | Small Models are Valuable Plug-ins for Large Langu age Models |
ArXiv | 2023/05/15 |
In-Context-Learning | Generative Agents: Interactive Simulacra of Human Behavior |
ArXiv | 2023/04/07 |
In-Context-Learning | Beyond the Imitation Game: Quantifying and extrapo lating the capabilities of language models |
ArXiv | 2022/06/09 |
In-Context-Learning | What does CLIP know about a red circle? Visual pro mpt engineering for VLMs |
ArXiv | |
In-Context-Learning | Can large language models explore in-context? | ArXiv | |
Image, LLaMA, Perception | LLaMA-VID: An Image is Worth 2 Tokens in Large Lan guage Models |
ArXiv | |
Hallucination, Survey | Combating Misinformation in the Age of LLMs: Oppor tunities and Challenges |
ArXiv | |
Gym, PPO, Reinforcement-Learning, Survey | Can Language Agents Approach the Performance of RL ? An Empirical Study On OpenAI Gym |
ArXiv | |
Grounding, Reinforcement-Learning | Grounding Large Language Models in Interactive Env ironments with Online Reinforcement Learning |
ArXiv | |
Grounding, Reasoning | Visually Grounded Reasoning across Languages and C ultures |
ArXiv | |
Grounding | V-IRL: Grounding Virtual Intelligence in Real Life | ||
Google, Grounding | GLaMM: Pixel Grounding Large Multimodal Model | ArXiv | |
Generation, Survey | Advances in 3D Generation: A Survey | ||
Generation, Robot, Zero-shot | Zero-Shot Robotic Manipulation with Pretrained Ima ge-Editing Diffusion Models |
ArXiv | |
Generation, Robot, Zero-shot | Towards Generalizable Zero-Shot Manipulationvia Tr anslating Human Interaction Plans |
||
GPT4V, Robot, VLM | Closed-Loop Open-Vocabulary Mobile Manipulation wi th GPT-4V |
ArXiv | |
GPT4, LLM | GPT-4 Technical Report | ArXiv | 2023/03/15 |
GPT4, Instruction-Turning | INSTRUCTION TUNING WITH GPT-4 | ArXiv | |
GPT4, Gemini, LLM | Gemini vs GPT-4V: A Preliminary Comparison and Com bination of Vision-Language Models Through Qualitative Cases |
ArXiv | 2023/12/22 |
Foundation, Robot, Survey | Foundation Models in Robotics: Applications, Chall enges, and the Future |
ArXiv | 2023/12/13 |
Foundation, LLaMA, Vision | VisionLLaMA: A Unified LLaMA Interface for Vision Tasks |
ArXiv | |
Foundation, LLM, Open-source | Code Llama: Open Foundation Models for Code | ArXiv | |
Foundation, LLM, Open-source | LLaMA: Open and Efficient Foundation Language Mode ls |
ArXiv | 2023/02/27 |
Feedback, Robot | Correcting Robot Plans with Natural Language Feedb ack |
ArXiv | |
Feedback, Robot | Learning to Learn Faster from Human Feedback with Language Model Predictive Control |
ArXiv | |
Feedback, Robot | REFLECT: Summarizing Robot Experiences for Failure Explanation and Correction |
ArXiv | 2023/06/27 |
Feedback, In-Context-Learning, Robot | InCoRo: In-Context Learning for Robotics Control w ith Feedback Loops |
ArXiv | |
Evaluation, LLM, Survey | A Survey on Evaluation of Large Language Models | ArXiv | |
Evaluation | simple-evals | GitHub | |
End2End, Multimodal, Robot | VIMA: General Robot Manipulation with Multimodal P rompts |
ArXiv | 2022/10/06 |
End2End, Multimodal, Robot | PaLM-E: An Embodied Multimodal Language Model | ArXiv | 2023/03/06 |
End2End, Multimodal, Robot | Physically Grounded Vision-Language Models for Rob otic Manipulation |
ArXiv | 2023/09/05 |
Enbodied | Embodied Question Answering | ArXiv | |
Embodied, World-model | Language Models Meet World Models: Embodied Experi ences Enhance Language Models |
||
Embodied, Robot, Task-Decompose | Embodied Task Planning with Large Language Models | ArXiv | 2023/07/04 |
Embodied, Robot | Large Language Models as Generalizable Policies fo r Embodied Tasks |
ArXiv | |
Embodied, Reasoning, Robot | Natural Language as Policies: Reasoning for Coordi nate-Level Embodied Control with LLMs |
ArXiv, GitHub | 2024/03/20 |
Embodied, LLM, Robot, Survey | The Development of LLMs for Embodied Navigation | ArXiv | 2023/11/01 |
Driving, Spacial | GPT-Driver: Learning to Drive with GPT | ArXiv | 2023/10/02 |
Drive, Survey | A Survey on Multimodal Large Language Models for A utonomous Driving |
ArXiv | |
Distilling, Survey | A Survey on Knowledge Distillation of Large Langua ge Models |
||
Distilling | Distilling Step-by-Step! Outperforming Larger Lang uage Models with Less Training Data and Smaller Model Sizes01 |
ArXiv | |
Diffusion, Text-to-Image | Mastering Text-to-Image Diffusion: Recaptioning, P lanning, and Generating with Multimodal LLMs |
ArXiv | |
Diffusion, Survey | On the Design Fundamentals of Diffusion Models: A Survey |
ArXiv | |
Diffusion, Robot | 3D Diffusion Policy | ArXiv | |
Diffusion | A latent text-to-image diffusion model | ||
Demonstration, GPT4, PersonalCitation, Robot, VLM | GPT-4V(ision) for Robotics: Multimodal Task Planni ng from Human Demonstration |
||
Datatset, LLM, Survey | A Survey on Data Selection for Language Models | ArXiv | |
Datatset, Instruction-Turning | REVO-LION: EVALUATING AND REFINING VISION LANGUAGE INSTRUCTION TUNING DATASETS |
||
Datatset, Instruction-Turning | Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models |
||
Datatset | PRM800K: A Process Supervision Dataset | GitHub | |
Data-generation, Robot | GenSim: Generating Robotic Simulation Tasks via La rge Language Models |
ArXiv | 2023/10/02 |
Data-generation, Robot | RoboGen: Towards Unleashing Infinite Data for Auto mated Robot Learning via Generative Simulation |
ArXiv | 2023/11/02 |
DPO, PPO, RLHF | A Comprehensive Survey of LLM Alignment Techniques : RLHF, RLAIF, PPO, DPO and More |
ArXiv | |
DPO | Is DPO Superior to PPO for LLM Alignment? A Compre hensive Study |
ArXiv | |
Context-Window, Scaling | Infini-gram: Scaling Unbounded n-gram Language Mod els to a Trillion Tokens |
||
Context-Window, Scaling | LONGNET: Scaling Transformers to 1,000,000,000 Tok ens |
ArXiv | 2023/07/01 |
Context-Window, Reasoning, RoPE, Scaling | Resonance RoPE: Improving Context Length Generaliz ation of Large Language Models |
ArXiv | |
Context-Window, LLM, RoPE, Scaling | LongRoPE: Extending LLM Context Window Beyond 2 Mi llion Tokens |
ArXiv | |
Context-Window, Foundation, Gemini, LLM, Scaling | Gemini 1.5: Unlocking multimodal understanding acr oss millions of tokens of context |
||
Context-Window, Foundation | Mamba: Linear-Time Sequence Modeling with Selectiv e State Spaces |
ArXiv | |
Context-Window | RoFormer: Enhanced Transformer with Rotary Positio n Embedding |
ArXiv | |
Context-Awere, Context-Window | DynaCon: Dynamic Robot Planner with Contextual Awa reness via LLMs |
ArXiv | |
Computer-Resource, Scaling | FlashAttention: Fast and Memory-Efficient Exact At tention with IO-Awareness |
ArXiv | |
Compress, Scaling | (Long)LLMLingua: Enhancing Large Language Model In ference via Prompt Compression |
ArXiv | |
Compress, Quantization, Survey | A Survey on Model Compression for Large Language M odels |
ArXiv | |
Compress, Prompting | Learning to Compress Prompts with Gist Tokens | ArXiv | |
Code-as-Policies, VLM, VQA | Visual Programming: Compositional visual reasoning without training |
ArXiv | 2022/11/18 |
Code-as-Policies, Robot | SMART-LLM: Smart Multi-Agent Robot Task Planning u sing Large Language Models |
ArXiv | 2023/09/18 |
Code-as-Policies, Robot | RoboScript: Code Generation for Free-Form Manipula tion Tasks across Real and Simulation |
ArXiv | |
Code-as-Policies, Robot | Creative Robot Tool Use with Large Language Models | ArXiv | |
Code-as-Policies, Reinforcement-Learning, Reward | Code as Reward: Empowering Reinforcement Learning with VLMs |
ArXiv | |
Code-as-Policies, Reasoning, VLM, VQA | ViperGPT: Visual Inference via Python Execution fo r Reasoning |
ArXiv | 2023/03/14 |
Code-as-Policies, Reasoning | Chain of Code: Reasoning with a Language Model-Aug mented Code Emulator |
ArXiv | |
Code-as-Policies, PersonalCitation, Robot, Zero-shot | Socratic Models: Composing Zero-Shot Multimodal Re asoning with Language |
ArXiv | 2022/04/01 |
Code-as-Policies, PersonalCitation, Robot, State-Manage | Statler: State-Maintaining Language Models for Emb odied Reasoning |
ArXiv | 2023/06/30 |
Code-as-Policies, PersonalCitation, Robot | ProgPrompt: Generating Situated Robot Task Plans u sing Large Language Models |
ArXiv | 2022/09/22 |
Code-as-Policies, PersonalCitation, Robot | RoboCodeX:Multi-modal Code Generation forRobotic B ehavior Synthesis |
ArXiv | |
Code-as-Policies, PersonalCitation, Robot | RoboGPT: an intelligent agent of making embodied l ong-term decisions for daily instruction tasks |
||
Code-as-Policies, PersonalCitation, Robot | ChatGPT for Robotics: Design Principles and Model Abilities |
||
Code-as-Policies, Multimodal, OpenGVLab, PersonalCitation, Robot | Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model |
ArXiv | 2023/05/18 |
Code-as-Policies, Embodied, PersonalCitation, Robot | Code as Policies: Language Model Programs for Embo died Control |
ArXiv | 2022/09/16 |
Code-as-Policies, Embodied, PersonalCitation, Reasoning, Robot, Task-Decompose | Inner Monologue: Embodied Reasoning through Planni ng with Language Models |
ArXiv | |
Code-LLM, Front-End | Design2Code: How Far Are We From Automating Front- End Engineering? |
ArXiv | |
Code-LLM | StarCoder 2 and The Stack v2: The Next Generation | ||
Chain-of-Thought, Reasoning, Table | Chain-of-table: Evolving tables in the reasoning c hain for table understanding |
ArXiv | |
Chain-of-Thought, Reasoning, Survey | Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters |
ArXiv | 2023/12/20 |
Chain-of-Thought, Reasoning, Survey | A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future |
ArXiv | 2023/09/27 |
Chain-of-Thought, Reasoning | Chain-of-Thought Prompting Elicits Reasoning in La rge Language Models |
ArXiv | 2022/01/28 |
Chain-of-Thought, Reasoning | Tree of Thoughts: Deliberate Problem Solving with Large Language Models |
ArXiv | 2023/05/17 |
Chain-of-Thought, Reasoning | Multimodal Chain-of-Thought Reasoning in Language Models |
ArXiv | 2023/02/02 |
Chain-of-Thought, Reasoning | Verify-and-Edit: A Knowledge-Enhanced Chain-of-Tho ught Framework |
ArXiv | 2023/05/05 |
Chain-of-Thought, Reasoning | Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding |
ArXiv | 2023/07/28 |
Chain-of-Thought, Reasoning | Rethinking with Retrieval: Faithful Large Language Model Inference |
ArXiv | 2022/12/31 |
Chain-of-Thought, Reasoning | Self-Consistency Improves Chain of Thought Reasoni ng in Language Models |
ArXiv | 2022/03/21 |
Chain-of-Thought, Reasoning | Chain-of-Thought Hub: A Continuous Effort to Measu re Large Language Models' Reasoning Performance |
ArXiv | 2023/05/26 |
Chain-of-Thought, Reasoning | Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation |
ArXiv | |
Chain-of-Thought, Prompting | Chain-of-Thought Reasoning Without Prompting | ArXiv | |
Chain-of-Thought, Planning, Reasoning | SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning |
ArXiv | 2023/08/01 |
Chain-of-Thought, In-Context-Learning, Self | Measuring and Narrowing the Compositionality Gap i n Language Models |
ArXiv | 2022/10/07 |
Chain-of-Thought, In-Context-Learning, Self | Self-Polish: Enhance Reasoning in Large Language M odels via Problem Refinement |
ArXiv | 2023/05/23 |
Chain-of-Thought, In-Context-Learning | Chain-of-Table: Evolving Tables in the Reasoning C hain for Table Understanding |
ArXiv | |
Chain-of-Thought, In-Context-Learning | Self-Refine: Iterative Refinement with Self-Feedba ck |
ArXiv | 2023/03/30 |
Chain-of-Thought, In-Context-Learning | Plan-and-Solve Prompting: Improving Zero-Shot Chai n-of-Thought Reasoning by Large Language Models |
ArXiv | 2023/05/06 |
Chain-of-Thought, In-Context-Learning | PAL: Program-aided Language Models | ArXiv | 2022/11/18 |
Chain-of-Thought, In-Context-Learning | Reasoning with Language Model is Planning with Wor ld Model |
ArXiv | 2023/05/24 |
Chain-of-Thought, In-Context-Learning | Least-to-Most Prompting Enables Complex Reasoning in Large Language Models |
ArXiv | 2022/05/21 |
Chain-of-Thought, In-Context-Learning | Complexity-Based Prompting for Multi-Step Reasonin g |
ArXiv | 2022/10/03 |
Chain-of-Thought, In-Context-Learning | Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations |
ArXiv | 2022/05/24 |
Chain-of-Thought, In-Context-Learning | Algorithm of Thoughts: Enhancing Exploration of Id eas in Large Language Models |
ArXiv | 2023/08/20 |
Chain-of-Thought, GPT4, Reasoning, Robot | Look Before You Leap: Unveiling the Power ofGPT-4V in Robotic Vision-Language Planning |
ArXiv | 2023/11/29 |
Chain-of-Thought, Embodied, Robot | EgoCOT: Embodied Chain-of-Thought Dataset for Visi on Language Pre-training |
ArXiv | |
Chain-of-Thought, Embodied, PersonalCitation, Robot, Task-Decompose | EmbodiedGPT: Vision-Language Pre-Training via Embo died Chain of Thought |
ArXiv | 2023/05/24 |
Chain-of-Thought, Code-as-Policies, PersonalCitation, Robot | Demo2Code: From Summarizing Demonstrations to Synt hesizing Code via Extended Chain-of-Thought |
ArXiv | |
Chain-of-Thought, Code-as-Policies | Chain of Code: Reasoning with a Language Model-Aug mented Code Emulator |
ArXiv | |
Caption, Video | PLLaVA : Parameter-free LLaVA Extension from Image s to Videos for Video Dense Captioning |
ArXiv | |
Caption, VLM, VQA | Caption Anything: Interactive Image Description wi th Diverse Multimodal Controls |
ArXiv | 2023/05/04 |
CRAG, RAG | Corrective Retrieval Augmented Generation | ArXiv | |
Brain, Instruction-Turning | Instruction-tuning Aligns LLMs to the Human Brain | ArXiv | |
Brain, Conscious | Could a Large Language Model be Conscious? | ||
Brain | LLM-BRAIn: AI-driven Fast Generation of Robot Beha viour Tree based on Large Language Model |
ArXiv | |
Brain | A Neuro-Mimetic Realization of the Common Model of Cognition via Hebbian Learning and Free Energy Minimization |
||
Benchmark, Sora, Text-to-Video | LIDA: A Tool for Automatic Generation of Grammar-A gnostic Visualizations and Infographics using Large Language Models01 |
||
Benchmark, In-Context-Learning | ARB: Advanced Reasoning Benchmark for Large Langua ge Models |
ArXiv | 2023/07/25 |
Benchmark, In-Context-Learning | PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change |
ArXiv | 2022/06/21 |
Benchmark, GPT4 | Sparks of Artificial General Intelligence: Early e xperiments with GPT-4 |
||
Awesome Repo, VLM | awesome-vlm-architectures | GitHub | |
Awesome Repo, Survey | LLMSurvey | GitHub | |
Awesome Repo, Robot | Awesome-LLM-Robotics | GitHub | |
Awesome Repo, Reasoning | Awesome LLM Reasoning | GitHub | |
Awesome Repo, Reasoning | Awesome-Reasoning-Foundation-Models | GitHub | |
Awesome Repo, RLHF, Reinforcement-Learning | Awesome RLHF (RL with Human Feedback) | GitHub | |
Awesome Repo, Perception, VLM | Awesome Vision-Language Navigation | GitHub | |
Awesome Repo, Package | Awesome LLMOps | GitHub | |
Awesome Repo, Multimodal | Awesome-Multimodal-LLM | GitHub | |
Awesome Repo, Multimodal | Awesome-Multimodal-Large-Language-Models | GitHub | |
Awesome Repo, Math, Science | Awesome Scientific Language Models | GitHub | |
Awesome Repo, LLM, Vision | LLM-in-Vision | GitHub | |
Awesome Repo, LLM, VLM | Multimodal & Large Language Models | GitHub | |
Awesome Repo, LLM, Survey | Awesome-LLM-Survey | GitHub | |
Awesome Repo, LLM, Robot | Everything-LLMs-And-Robotics | GitHub | |
Awesome Repo, LLM, Leaderboard | LLM-Leaderboard | GitHub | |
Awesome Repo, LLM | Awesome-LLM | GitHub | |
Awesome Repo, Korean | awesome-korean-llm | GitHub | |
Awesome Repo, Japanese, LLM | 日本語LLMまとめ | GitHub | |
Awesome Repo, In-Context-Learning | Paper List for In-context Learning | GitHub | |
Awesome Repo, IROS, Robot | IROS2023PaperList | GitHub | |
Awesome Repo, Hallucination, Survey | A Survey on Hallucination in Large Language Models : Principles, Taxonomy, Challenges, and Open Questions |
ArXiv, GitHub | |
Awesome Repo, Embodied | Awesome Embodied Vision | GitHub | |
Awesome Repo, Diffusion | Awesome-Diffusion-Models | GitHub | |
Awesome Repo, Compress | Awesome LLM Compression | GitHub | |
Awesome Repo, Chinese | Awesome-Chinese-LLM | GitHub | |
Awesome Repo, Chain-of-Thought | Chain-of-ThoughtsPapers | GitHub | |
Automate, Prompting | Large Language Models Are Human-Level Prompt Engin eers |
ArXiv | 2022/11/03 |
Automate, Chain-of-Thought, Reasoning | Automatic Chain of Thought Prompting in Large Lang uage Models |
ArXiv | 2022/10/07 |
Audio2Video, Diffusion, Generation, Video | EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions |
ArXiv | |
Audio | Robust Speech Recognition via Large-Scale Weak Sup ervision |
||
Apple, VLM | MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training |
ArXiv | |
Apple, VLM | Guiding Instruction-based Image Editing via Multim odal Large Language Models |
ArXiv | |
Apple, Robot | Large Language Models as Generalizable Policies fo r Embodied Tasks |
ArXiv | |
Apple, LLM, Open-source | OpenELM: An Efficient Language Model Family with O pen Training and Inference Framework |
ArXiv | |
Apple, LLM | ReALM: Reference Resolution As Language Modeling | ArXiv | |
Apple, LLM | LLM in a flash: Efficient Large Language Model Inf erence with Limited Memory |
ArXiv | |
Apple, In-Context-Learning, Perception | SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding |
ArXiv | |
Apple, Code-as-Policies, Robot | Executable Code Actions Elicit Better LLM Agents | ArXiv | |
Apple | Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models |
ArXiv | |
Apple | Ferret-UI: Grounded Mobile UI Understanding with M ultimodal LLMs |
ArXiv | |
Apple | Ferret: Refer and Ground Anything Anywhere at Any Granularity |
ArXiv | |
Anything, LLM, Open-source, Perception, Segmentation | Segment Anything | ArXiv | 2023/04/05 |
Anything, Depth | Depth Anything: Unleashing the Power of Large-Scal e Unlabeled Data |
ArXiv | |
Anything, Caption, Perception, Segmentation | Segment and Caption Anything | ArXiv | |
Anything, CLIP, Perception | SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding |
ArXiv | |
Agent-Project, Code-LLM | open-interpreter | GitHub | |
Agent, Web | WebLINX: Real-World Website Navigation with Multi- Turn Dialogue |
ArXiv | |
Agent, Web | WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models |
ArXiv | |
Agent, Web | OmniACT: A Dataset and Benchmark for Enabling Mult imodal Generalist Autonomous Agents for Desktop and Web |
ArXiv | |
Agent, Web | OS-Copilot: Towards Generalist Computer Agents wit h Self-Improvement |
ArXiv | |
Agent, Video-for-Agent | Video as the New Language for Real-World Decision Making |
||
Agent, VLM | AssistGPT: A General Multi-modal Assistant that ca n Plan, Execute, Inspect, and Learn |
ArXiv | |
Agent, Tool | Gorilla: Large Language Model Connected with Massi ve APIs |
ArXiv | |
Agent, Tool | ToolLLM: Facilitating Large Language Models to Mas ter 16000+ Real-world APIs |
ArXiv | |
Agent, Survey | A Survey on Large Language Model based Autonomous Agents |
ArXiv | 2023/08/22 |
Agent, Survey | The Rise and Potential of Large Language Model Bas ed Agents: A Survey |
ArXiv | 2023/09/14 |
Agent, Survey | Agent AI: Surveying the Horizons of Multimodal Int eraction |
ArXiv | |
Agent, Survey | Large Multimodal Agents: A Survey | ArXiv | |
Agent, Soft-Dev | Communicative Agents for Software Development | GitHub | |
Agent, Soft-Dev | MetaGPT: Meta Programming for A Multi-Agent Collab orative Framework |
ArXiv | |
Agent, Robot, Survey | A Survey on LLM-based Autonomous Agents | GitHub | |
Agent, Reinforcement-Learning, Reward | Reward Design with Language Models | ArXiv | 2023/02/27 |
Agent, Reinforcement-Learning, Reward | EAGER: Asking and Answering Questions for Automati c Reward Shaping in Language-guided RL |
ArXiv | 2022/06/20 |
Agent, Reinforcement-Learning, Reward | Text2Reward: Automated Dense Reward Function Gener ation for Reinforcement Learning |
ArXiv | 2023/09/20 |
Agent, Reinforcement-Learning | Eureka: Human-Level Reward Design via Coding Large Language Models |
ArXiv | 2023/10/19 |
Agent, Reinforcement-Learning | Language to Rewards for Robotic Skill Synthesis | ArXiv | 2023/06/14 |
Agent, Reinforcement-Learning | Language Instructed Reinforcement Learning for Hum an-AI Coordination |
ArXiv | 2023/04/13 |
Agent, Reinforcement-Learning | Guiding Pretraining in Reinforcement Learning with Large Language Models |
ArXiv | 2023/02/13 |
Agent, Reinforcement-Learning | STARLING: SELF-SUPERVISED TRAINING OF TEXTBASED RE INFORCEMENT LEARNING AGENT WITH LARGE LANGUAGE MODELS |
||
Agent, Reasoning, Zero-shot | Agent Instructs Large Language Models to be Genera l Zero-Shot Reasoners |
ArXiv | 2023/10/05 |
Agent, Reasoning | Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning |
ArXiv | |
Agent, Reasoning | AGENT INSTRUCTS LARGE LANGUAGE MODELS TO BE GENERA L ZERO-SHOT REASONERS |
ArXiv | |
Agent, Multimodal, Robot | A Generalist Agent | ArXiv | 2022/05/12 |
Agent, Multi | War and Peace (WarAgent): Large Language Model-bas ed Multi-Agent Simulation of World Wars |
ArXiv | |
Agent, MobileApp | You Only Look at Screens: Multimodal Chain-of-Acti on Agents |
ArXiv, GitHub | |
Agent, Minecraft, Reinforcement-Learning | RLAdapter: Bridging Large Language Models to Reinf orcement Learning in Open Worlds |
||
Agent, Minecraft | Voyager: An Open-Ended Embodied Agent with Large L anguage Models |
ArXiv | 2023/05/25 |
Agent, Minecraft | Describe, Explain, Plan and Select: Interactive Pl anning with Large Language Models Enables Open-World Multi-Task Agents |
ArXiv | 2023/02/03 |
Agent, Minecraft | LARP: Language-Agent Role Play for Open-World Game s |
ArXiv | |
Agent, Minecraft | Steve-Eye: Equipping LLM-based Embodied Agents wit h Visual Perception in Open Worlds |
ArXiv | |
Agent, Minecraft | S-Agents: Self-organizing Agents in Open-ended Env ironment |
ArXiv | |
Agent, Minecraft | Ghost in the Minecraft: Generally Capable Agents f or Open-World Environments via Large Language Models with Text-based Knowledge and Memory01 |
ArXiv | |
Agent, Memory, RAG, Robot | RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents |
ArXiv | 2024/02/06 |
Agent, Memory, Minecraft | JARVIS-1: Open-World Multi-task Agents with Memory -Augmented Multimodal Language Models |
ArXiv | 2023/11/10 |
Agent, LLM, Planning | LLM-Planner: Few-Shot Grounded Planning for Embodi ed Agents with Large Language Models |
ArXiv | |
Agent, Instruction-Turning | AgentTuning: Enabling Generalized Agent Abilities For LLMs |
ArXiv | |
Agent, Game | LEARNING EMBODIED VISION-LANGUAGE PRO- GRAMMING FR OM INSTRUCTION, EXPLORATION, AND ENVIRONMENTAL FEEDBACK |
ArXiv | |
Agent, GUI, Web | "What’s important here?": Opportunities and Challe nges of Using LLMs in Retrieving Informatio from Web Interfaces |
ArXiv | |
Agent, GUI, MobileApp | Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception |
||
Agent, GUI, MobileApp | AppAgent: Multimodal Agents as Smartphone Users | ArXiv | |
Agent, GUI, MobileApp | You Only Look at Screens: Multimodal Chain-of-Acti on Agents |
||
Agent, GUI | CogAgent: A Visual Language Model for GUI Agents | ArXiv | |
Agent, GUI | ScreenAgent: A Computer Control Agent Driven by Vi sual Language Large Model |
GitHub | |
Agent, GUI | SeeClick: Harnessing GUI Grounding for Advanced Vi sual GUI Agents |
ArXiv | |
Agent, GPT4, Web | GPT-4V(ision) is a Generalist Web Agent, if Ground ed |
ArXiv | |
Agent, Feedback, Reinforcement-Learning, Robot | Accelerating Reinforcement Learning of Robotic Man ipulations via Feedback from Large Language Models |
ArXiv | 2023/11/04 |
Agent, Feedback, Reinforcement-Learning | AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback |
ArXiv | 2023/09/29 |
Agent, End2End, Game, Robot | An Interactive Agent Foundation Model | ArXiv | |
Agent, Embodied, Survey | Application of Pretrained Large Language Models in Embodied Artificial Intelligence |
ArXiv | |
Agent, Embodied, Robot | AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents |
ArXiv | |
Agent, Embodied, Robot | OPEx: A Component-Wise Analysis of LLM-Centric Age nts in Embodied Instruction Following |
ArXiv | |
Agent, Embodied | OpenAgents: An Open Platform for Language Agents i n the Wild |
ArXiv, GitHub | |
Agent, Embodied | LLM-Planner: Few-Shot Grounded Planning for Embodi ed Agents with Large Language Models |
ArXiv | |
Agent, Embodied | Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld |
ArXiv | |
Agent, Embodied | Octopus: Embodied Vision-Language Programmer from Environmental Feedback |
||
Agent, Embodied | Embodied Task Planning with Large Language Models | ArXiv | |
Agent, Diffusion, Speech | NaturalSpeech 3: Zero-Shot Speech Synthesis with F actorized Codec and Diffusion Models |
ArXiv | |
Agent, Code-as-Policies | Executable Code Actions Elicit Better LLM Agents | ArXiv | 2024/01/24 |
Agent, Code-LLM, Code-as-Policies, Survey | If LLM Is the Wizard, Then Code Is the Wand: A Sur vey on How Code Empowers Large Language Models to Serve as Intelligent Agents |
ArXiv | |
Agent, Code-LLM | TaskWeaver: A Code-First Agent Framework | ||
Agent, Blog | LLM Powered Autonomous Agents | ArXiv | |
Agent, Awesome Repo, LLM | Awesome-Embodied-Agent-with-LLMs | GitHub | |
Agent, Awesome Repo, LLM | CoALA: Awesome Language Agents | ArXiv, GitHub | |
Agent, Awesome Repo, Embodied, Grounding | XLang Paper Reading | GitHub | |
Agent, Awesome Repo | Awesome AI Agents | GitHub | |
Agent, Awesome Repo | Autonomous Agents | GitHub | |
Agent, Awesome Repo | Awesome-Papers-Autonomous-Agent | GitHub | |
Agent, Awesome Repo | Awesome Large Multimodal Agents | GitHub | |
Agent, Awesome Repo | LLM Agents Papers | GitHub | |
Agent, Awesome Repo | Awesome LLM-Powered Agent | GitHub | |
Agent | XAgent: An Autonomous Agent for Complex Task Solvi ng |
||
Agent | LLM-Powered Hierarchical Language Agent for Real-t ime Human-AI Coordination |
ArXiv | |
Agent | AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors |
ArXiv | |
Agent | Agents: An Open-source Framework for Autonomous La nguage Agents |
ArXiv, GitHub | |
Agent | AutoAgents: A Framework for Automatic Agent Genera tion |
GitHub | |
Agent | DSPy: Compiling Declarative Language Model Calls i nto Self-Improving Pipelines |
ArXiv | |
Agent | AutoGen: Enabling Next-Gen LLM Applications via Mu lti-Agent Conversation |
ArXiv | |
Agent | CAMEL: Communicative Agents for “Mind” Exploration of Large Language Model Society |
ArXiv | |
Agent | XAgent: An Autonomous Agent for Complex Task Solvi ng |
ArXiv | |
Agent | Generative Agents: Interactive Simulacra of Human Behavior |
ArXiv | |
Agent | LLM+P: Empowering Large Language Models with Optim al Planning Proficiency |
ArXiv | 2023/04/22 |
Agent | AgentSims: An Open-Source Sandbox for Large Langua ge Model Evaluation |
ArXiv | 2023/08/08 |
Agent | Agents: An Open-source Framework for Autonomous La nguage Agents |
ArXiv | |
Agent | MindAgent: Emergent Gaming Interaction | ArXiv | |
Agent | InfiAgent: A Multi-Tool Agent for AI Operating Sys tems |
||
Agent | Predictive Minds: LLMs As Atypical Active Inferenc e Agents |
||
Agent | swarms | GitHub | |
Agent | ScreenAgent: A Vision Language Model-driven Comput er Control Agent |
ArXiv | |
Agent | AssistGPT: A General Multi-modal Assistant that ca n Plan, Execute, Inspect, and Learn |
ArXiv | |
Agent | PromptAgent: Strategic Planning with Language Mode ls Enables Expert-level Prompt Optimization |
ArXiv | |
Agent | Cognitive Architectures for Language Agents | ArXiv | |
Agent | AIOS: LLM Agent Operating System | ArXiv | |
Agent | LLM as OS, Agents as Apps: Envisioning AIOS, Agent s and the AIOS-Agent Ecosystem |
ArXiv | |
Agent | Towards General Computer Control: A Multimodal Age nt for Red Dead Redemption II as a Case Study |
||
Affordance, Segmentation | ManipVQA: Injecting Robotic Affordance and Physica lly Grounded Information into Multi-Modal Large Language Models |
ArXiv | |
Action-Model, Agent, LAM | LaVague | GitHub | |
Action-Generation, Generation, Prompting | Prompt a Robot to Walk with Large Language Models | ArXiv | |
APIs, Agent, Tool | Gorilla: Large Language Model Connected with Massi ve APIs |
ArXiv | |
AGI, Survey | Levels of AGI: Operationalizing Progress on the Pa th to AGI |
ArXiv | |
AGI, Brain | When Brain-inspired AI Meets AGI | ArXiv | |
AGI, Brain | Divergences between Language Models and Human Brai ns |
ArXiv | |
AGI, Awesome Repo, Survey | Awesome-LLM-Papers-Toward-AGI | GitHub | |
AGI, Agent | OpenAGI: When LLM Meets Domain Experts | ||
3D, Open-source, Perception, Robot | 3D-LLM: Injecting the 3D World into Large Language Models |
ArXiv | 2023/07/24 |
3D, GPT4, VLM | GPT-4V(ision) is a Human-Aligned Evaluator for Tex t-to-3D Generation |
ArXiv | |
ChatEval: Towards Better LLM-based Evaluators thro ugh Multi-Agent Debate |
ArXiv | 2023/08/14 |