mlx-chat / mlx-chat-app

Chat with MLX is a high-performance macOS application that connects your local documents to a personalized large language model (LLM).
MIT License
163 stars 10 forks source link

[MLC-14] server: step toward Retrieval Augmented Generation (RAG) w/ … #8

Closed stockeh closed 8 months ago

stockeh commented 8 months ago

…indexing (load, split, store) retriever.

Overview of RAG: https://python.langchain.com/docs/use_cases/question_answering/

Implementation Details

Risks

stockeh commented 8 months ago

Example usage of retriever:

from server.utils import load

from server.retriever.loader import directory_loader
from server.retriever.splitter import RecursiveCharacterTextSplitter
from server.retriever.vectorstore import Chroma, Embeddings

def main():
    model, tokenizer = load('mlx-community/quantized-gemma-7b-it')
    raw_docs = directory_loader(
        '/Users/stock/Library/Mobile Documents/iCloud~md~obsidian/Documents/main')
    print(len(raw_docs), len(raw_docs[0].page_content))
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1024, chunk_overlap=32, add_start_index=True
    )
    splits = text_splitter.split_documents(raw_docs)
    print(len(splits), len(splits[0].page_content), splits[0].metadata)
    db = Chroma.from_documents(
        documents=splits, embedding=Embeddings(model.model, tokenizer))
    print('-------------------')
    query = "What is a cascade neural network?"
    # docs = db.similarity_search(query)
    docs = db.max_marginal_relevance_search(query)
    print('>', query)
    for doc in docs:
        print(doc.page_content, doc.metadata, sep='\n')
        print('-------------------')

if __name__ == '__main__':
    main()