Method to load memories into the vector DB *enhancement* #20

Open IllogicalDesigns opened 1 year ago

IllogicalDesigns commented 1 year ago

I wrote a quick script to load memories into the vector DB.

import re
import json
import pathlib
from extensions.long_term_memory.core.memory_database import LtmDatabase

# === Internal constants (don't change these without good reason) ===
_CONFIG_PATH = "ltm_config.json"
_LTM_STATS_TEMPLATE = """{num_memories_seen_by_bot} memories are loaded in the bot
{num_memories_in_ram} memories are loaded in RAM
{num_memories_on_disk} memories are saved to disk"""
with open(_CONFIG_PATH, "rt") as handle:
    _CONFIG = json.load(handle)

memory_database = LtmDatabase(

with open('Text.txt', 'r', encoding='utf-8') as file:
    paragraphs = file.read().split('\n\n')  # Split by double newline to separate paragraphs

for paragraph in paragraphs:
    paragraph = paragraph.strip() # stripping leading/trailing whitespace

    pattern = r'[^\w\s;-]' # Remove problematic characters using regular expression  # .,!?
    paragraph = re.sub(pattern, '', paragraph)

    if not paragraph:     # Skip empty paragraphs

    memory_database.add("Assistant", paragraph)

print("num_memories_on_disk:", memory_database.disk_embeddings.shape[0])
BarfingLemurs commented 1 year ago

Thanks, I tried this, and It works, I modified it for separating at triple lines instead.

I named this script convert2vectorDB.py, placed in text-generation-webui/ folder, along with a file name Text.txt with some example text:

(It will split the segment right before "I just compiled")

import re
import json
import pathlib
from extensions.long_term_memory.core.memory_database import LtmDatabase

# === Internal constants (don't change these without good reason) ===
_CONFIG_PATH = "./extensions/long_term_memory/ltm_config.json"
_LTM_STATS_TEMPLATE = """{num_memories_seen_by_bot} memories are loaded in the bot
{num_memories_in_ram} memories are loaded in RAM
{num_memories_on_disk} memories are saved to disk"""
with open(_CONFIG_PATH, "rt") as handle:
    _CONFIG = json.load(handle)

memory_database = LtmDatabase(

with open('Text.txt', 'r', encoding='utf-8') as file:
    paragraphs = file.read().split('\n\n')  # Split by triple newline

for paragraph in paragraphs:
    paragraph = paragraph.strip() # stripping leading/trailing whitespace

    pattern = r'[^\w\s;-]' # Remove problematic characters using regular expression  # .,!?
    paragraph = re.sub(pattern, '', paragraph)

    if not paragraph:     # Skip empty paragraphs

    memory_database.add("Assistant", paragraph)

print("num_memories_on_disk:", memory_database.disk_embeddings.shape[0])

I used the one click installer, so I entered the enviornment with

  1. cd oobabooga_linux/
  2. source "./installer_files/conda/etc/profile.d/conda.sh" && conda activate ./installer_files/env
  3. cd text-generation-webui
  4. python convert2vectorDB.py

Hope this is helpful,

you can then verify that the 0.0 binary embedding file has increased in size, and your .db file has new raw text added, by using an online viewer like https://sqliteviewer.app/ to view if your text has separated properly.

Belarrius1 commented 1 year ago

I just lunch "cmd_linux.sh" with terminal, and cd text-generation-webui and python convert2vectorDB.py.

It's works perfectly!

rasehum commented 1 year ago

I am wondering, would it be possible with windows? and how would you go about doing it? My apologies as this is my first time with this!