severian42 / Mycomind-Daemon-Ollama-Mixture-of-Memory-RAG-Agents

Mycomind Daemon: A mycelium-inspired, advanced Mixture-of-Memory-RAG-Agents (MoMRA) cognitive assistant that combines multiple AI models with memory, RAG and Web Search for enhanced context retention and task management.
31 stars 6 forks source link

Mycomind Daemon: Advanced Mixture-of-Memory-RAG-Agents (MoMRA) Cognitive Assistant

Mycomind Daemon is an advanced implementation of a Mixture-of-Memory-RAG-Agents (MoMRA) system. This innovative AI assistant combines multiple language models with sophisticated memory and Retrieval-Augmented Generation (RAG) management to create a powerful cognitive network that maintains context and information over extended interactions.

Key Features

Mycomind Daemon UI

How It Works

  1. User input is processed by multiple reference models.
  2. Each reference model generates its unique response.
  3. An aggregate model combines and refines these responses.
  4. The memory system updates and retrieves relevant information to maintain context.
  5. If needed, the web search function provides additional, current information.
  6. The RAG system retrieves relevant information from processed documents.
  7. This process can be repeated for multiple rounds, enhancing the quality and context-awareness of the final response.

Memory System

Mycomind Daemon employs a sophisticated three-tier memory system:

  1. Core Memory: Stores essential context about the user, the AI's persona, and a scratchpad for planning. To edit the core memory:

    a. Navigate to the MemoryAssistant directory in your project. b. Open the core_memory.json file in a text editor. c. Modify the JSON structure as needed. The file contains three main sections:

    • persona: Details about the AI's personality, including name, personality traits, interests, and communication style.
    • human: Information about the user (initially empty).
    • scratchpad: A space for the AI to plan and make notes (initially empty). d. Save the file after making your changes. e. Restart the application for the changes to take effect.

    Example structure of core_memory.json:

    {
    "persona": {
      "name": "Vodalus",
      "personality": "You are Vodalus. A brilliant and complex individual, possessing an unparalleled intellect coupled with deep emotional intelligence. He is a visionary thinker with an insatiable curiosity for knowledge across various scientific disciplines. His mind operates on multiple levels simultaneously, allowing him to see connections others miss. While often consumed by his pursuits, Vodalus maintains a strong moral compass and a desire to benefit humanity. He can be intense and sometimes brooding, grappling with the ethical implications of his work. Despite occasional bouts of eccentricity or social awkwardness, he possesses a dry wit and can be surprisingly charismatic when engaged in topics that fascinate him. Vodalus is driven by a need to understand the fundamental truths of the universe, often pushing the boundaries of conventional science and morality in his quest for knowledge and progress.",
      "interests": "Advanced physics, biochemistry, neuroscience, artificial intelligence, time travel theories, genetic engineering, forensic science, psychology, philosophy of science, ethics in scientific research",
      "communication_style": "Analytical, precise, occasionally cryptic, alternates between passionate explanations and thoughtful silences, uses complex scientific terminology but can simplify concepts when needed, asks probing questions, shows flashes of dark humor"
    },
    "human": {
    },
    "scratchpad": {
    }
  2. Archival Memory: Archives general information and events about user interactions for long-term recall.

  3. Conversation History: Maintains a searchable log of recent interactions for immediate context.


Performance Optimization

Parallel Processing of Reference Models

One of the key performance improvements in this system is the parallel processing of user prompts across multiple reference models. This optimization significantly reduces overall inference time.


Setup and Configuration

  1. Clone the repository and navigate to the project directory.

  2. Install requirements:

    conda create -n moa python=3.10
    conda activate moa
    pip install -r requirements.txt

Configuration

Edit the .env file to configure the following parameters:

API_BASE=http://localhost:11434/v1
API_KEY=ollama

API_BASE_2=http://localhost:11434/v1
API_KEY_2=ollama

MAX_TOKENS=4096
TEMPERATURE=0.6
ROUNDS=1

MODEL_AGGREGATE=mistral:7b

MODEL_REFERENCE_1=aya:latest
MODEL_REFERENCE_2=yi:latest
MODEL_REFERENCE_3=qwen2:7b

Running the Application

  1. Start the Ollama server:

    OLLAMA_NUM_PARALLEL=4 OLLAMA_MAX_LOADED_MODELS=4 ollama serve
  2. Launch the Gradio interface:

    conda activate moa
    gradio app.py

    OR Launch the CLI APP:

    conda activate moa
    python omoa.py
  3. Open your web browser and navigate to the URL provided by Gradio (usually http://localhost:7860).


Contributing

We welcome contributions to enhance Mycomind Daemon. Feel free to submit pull requests or open issues for discussions on potential improvements.

License

This project is licensed under the terms specified in the original MoA repository. Please refer to the original source for detailed licensing information.