Open irthomasthomas opened 2 months ago
- [7/19] We release a major upgrade, including support for LLaMA-2, LoRA training, 4-/8-bit inference, higher resolution (336x336), and a lot more. We release LLaVA Bench for benchmarking open-ended visual chat with results from Bard and Bing-Chat. We also support and verify training with RTX 3090 and RTX A6000. Check out LLaVA-from-LLaMA-2, and our model zoo! - [6/26] CVPR 2023 Tutorial on Large Multimodal Models: Towards Building and Surpassing Multimodal GPT-4! Please check out Slides Notes YouTube Bilibli. - [6/11] We released the preview for the most requested feature: DeepSpeed and LoRA support! Please see documentations here. - [6/1] We released LLaVA-Med: Large Language and Vision Assistant for Biomedicine, a step towards building biomedical domain large language and vision models with GPT-4 level capabilities. Checkout the paper and page. - [5/6] We are releasing LLaVA-Lighting-MPT-7B-preview, based on MPT-7B-Chat! See here for more details. - [5/2] We are releasing LLaVA-Lighting! Train a lite, multimodal GPT-4 with just $40 in 3 hours! See here for more details. - [4/27] Thanks to the community effort, LLaVA-13B with 4-bit quantization allows you to run on a GPU with as few as 12GB VRAM! Try it out here. - [4/17] We released LLaVA: Large Language and Vision Assistant. We propose visual instruction tuning, towards building large language and vision models with GPT-4 level capabilities. Checkout the paper and demo.
RichardAragon/MultiAgentLLM
DESCRIPTION: "Multi Agent Language Learning Machine (Multi Agent LLM)
(Update) 1/20/2024: Global Fine Tuned Phi Model Located Here: PhiGlobalFineTunedAgent
I can complete the second fine tunes for the individual agent actors (Planner, Caller, Observer). I cannot complete the fine tuning process and upload the completed models to HuggingFace due to compute limits. I need more GPUs :(
(Update) 1/19/2024: All datasets available at HuggingFace Repo: TuringsSolutions
Introduction
Welcome to the official repository for the Multi Agent LLM, a faithful recreation of the Small LLMs Are Weak Tool Learners: A Multi-LLM Agent research paper framework under the MIT Open Source License. Our aim is to fine-tune distinct Planner, Caller, and Summarizer Agents capable of completing complex tasks efficiently. This will be completed utilizing 3 Tiny Llama models. Datasets included from the research paper and from the Gorilla release will be utilized to create further synthetic datasets to train the 3 distinct models.
Getting Started
Prerequisites
requirements.txt
Installation
Multi Agent LLM Methodology Overview
Our project introduces the Multi Agent LLM framework, built around Large Language Models (LLMs) to handle complex tasks involving tool usage and decision-making processes. This framework draws inspiration from the ReACT framework (Yao et al., 2022) and is aimed at addressing challenges faced by Single LLM solutions for Tool Learning tasks.
Three main modules constitute the Multi Agent LLM architecture: Planner (M_plan), Caller (M_call), and Summarizer (M_sum). By dividing labor amongst the agents and dedicating a specific LLM for each sub-task, we enhance the overall effectiveness of tackling complex problems involving task planning, tool selection, and result summarization.
The α-UMi Framework workflow commences when the user inputs a query (). The Planner module creates a rationale () guiding the upcoming step. According to , the procedure advances to the caller being triggered to interact with the tools and collect observations. After gathering enough information, the planner shifts control over to the Summarizer module, which forms the final response for the user. In contrast, if the instruction remains unresolved, the system abandons the attempt.
Each module plays a distinctive role:
Planner: Acting as the brain, the Planner receives the user instruction, prior execution trajectory (), and system prompt () and outputs a rationale (): \$\$ rt = M{plan}(P{plan},\ tau{t-1},\ q)\$\$
Caller: Trained to concentrate solely on producing proper tool interaction commands, the Caller accepts user instruction and preceding execution trajectory (). With guidance from , the Caller delivers the requested action ():
Summarizer: Dedicated to crafting the final meaningful response, the Summarizer obtains the user instruction, former execution trajectory (), , and final rationale (), delivering the final answer ():
Global-to-Local Progressive Fine-Tuning Strategy
We propose a novel two-stage fine-tuning technique – Global-to-Local Progressive Fine-Tuning (GLPFT) – applied to α-UMi framework modules. GLPFT ensures effective fine-tuning adapted to specific roles. Initially, a shared base LLM experiences global fine-tuning on generic large datasets. Following this, specializations occur during local fine-tuning, fine-tuning subsets aligned with the roles and duties assumed by the dedicated modules. Additional details concerning data organization and prompt adaptation appear in Appendex A.
Here are the steps to create the global fine-tuning dataset for training the backbone LLM:
So in summary - keep full trajectories together as one long target text, have multiple variants per user instruction, and do not differentiate between sub-tasks at this stage. Let me know if you need any other details!
Here is a methodology you can follow to create the training data needed for the Planner agent, based on the approach outlined in the research paper:
Here is an example row from the global fine-tuning dataset:
So in this example, the full trajectory from the initial user query to the final flight booking is kept intact as one sequence, which the model will be trained to generate end-to-end. Here is an example training set row for fine-tuning the Planner specifically:
In this example, only the rationales from the Planner are kept as the target text. The actions, observations, and answers are removed. The format is updated to have the \"Next: Caller/Summarizer\" appended to each rationale. And the prompt provides some context about the user's flight booking request.
This structures the data specifically for fine-tuning the Planner's ability to generate helpful rationales and decide the next high-level step.
References
This project is distributed under the terms of the MIT Open Source License. Contributions are welcome! Please feel free to submit Pull Requests and report issues. Refer to CONTRIBUTING for guidelines."
Suggested labels