irthomasthomas / undecidability

3 stars 2 forks source link

Pattern to extract conversation thread for embeddings: #26

Open irthomasthomas opened 10 months ago

irthomasthomas commented 10 months ago

Algorithm to extract conversation thread and send to OpenAI API for embeddings:

import sqlite3
from openai import OpenAIAPI

def fetch_conversation(conversation_id):

    # Create a DB connection
    conn = sqlite3.connect('chatgpt_conversation_db.db') 
    cursor = conn.cursor() 

    # SQL query to retrieve a conversation based on id
    query = """SELECT responses.prompt, responses.response 
               FROM responses 
               WHERE responses.conversation_id = ?"""

    # Execute the query
    cursor.execute(query, (conversation_id,))

    # Fetch results
    conversation = cursor.fetchall()

    conn.close()

    return conversation

def send_to_openai_api(conversation):

    convo_text = "
".join([f"User: {c[0]}
ChatGPT: {c[1]}" for c in conversation])

    openai_api = OpenAIAPI("your-api-key")

    embeddings = openai_api.encode(convo_text)

    return embeddings

Potential usages for embeddings and chat DB:

  1. Conversation Classification: We can use the embeddings to train machine learning models that classify the conversations by their content or sentiment.

  2. Topic Modeling: The embeddings can be used to conduct topic modeling to understand the main topics discussed during the conversation.

  3. Information Retrieval: The chat database could be utilized to build a retrieval-based chatbot that fetches relevant information based on context.

  4. User Behavior Understanding: Analyzing chat logs can help in understanding user behavior, preferences, and interaction patterns.

Approaches to enrich the database:

  1. Adding metadata: Information like user demographics, time of conversation, etc., can add value to the analyses.

  2. Adding conversation context: Adding data about the context of the conversation can provide help in retrieving and understanding the conversation better.

Creating topic system:

We can introduce a table "topics" with columns for "topic_id" and "topic_name". We add a "topic_id" column to "conversations" table. We then use a topic modeling algorithm like LDA (Latent Dirichlet Allocation) on conversation text to find main topics and link conversations to these topics.

CREATE TABLE [topics] (
   [id] INTEGER PRIMARY KEY,
   [name] TEXT
);
ALTER TABLE [conversations] 
ADD COLUMN [topic_id] INTEGER REFERENCES [topics]([id]);

Then to retrieve chats based on their topic, we can query:

SELECT * 
FROM conversations, responses, topics
WHERE conversations.id = responses.conversation_id 
AND conversations.topic_id = topics.id
AND topics.name = ?;
irthomasthomas commented 10 months ago

Since this was co-written, simonw has added an embeddings feature to llm cli. It supports local models and openai ada2 embeddings api.

https://simonwillison.net/2023/Sep/4/llm-embeddings/