raghavan / PdfGptIndexer

RAG based tool for indexing and searching PDF text data using OpenAI API and FAISS (Facebook AI Similarity Search) index, designed for rapid information retrieval and superior search accuracy.
MIT License
655 stars 29 forks source link

Large number of books and personalization. #5

Closed fishuke closed 11 months ago

fishuke commented 12 months ago

Hi, I'm quite impressed with what you did with swamisivananda.ai. I want to do something very similar so i wanted to ask based on projects what you did differently other then server deployment frontend and basic stuff. I am more interested if you used something else in your training and if you used some prompts in chat. I've 100 books and i need to chat with them. Any feedback is golden. Also if there is a better way to contact you please lmk so we can talk from there.

raghavan commented 11 months ago

I haven't made the code public yet, but here's the basic concept to help you integrate Pinecone, OpenAI, and NextJs. You're welcome to experiment with the Prompt. To populate the Pinecone vector database, you can utilize the PDFGPTIndexer project. Just remember to change the target from FAISS to Pinecone. Good luck! 👍


import type { NextApiRequest, NextApiResponse } from "next";
import { Configuration, OpenAIApi } from "openai";
import pineconeStore from "@/utils/pineconeStore";

const configuration = new Configuration({
  apiKey: process.env.OPENAI_KEY,
});

const openai = new OpenAIApi(configuration);

export default async function translate(
  req: NextApiRequest,
  res: NextApiResponse
) {
  const { messages, userName } = req.body;

  const translatedText = await askOpenAI({ messages, userName });

  res.setHeader("Content-Type", "application/json");
  res.send(JSON.stringify({translatedText}));
}

async function askOpenAI({
  messages,
  userName,
}: {
  messages: Message[];
  userName: string;
}) {

  const pinecone = await pineconeStore();

  // updated the message content to include context snippets
  if (messages?.length > 0) {
    const lastMsgContent = messages[messages.length - 1].content;

    const data = await pinecone.similaritySearch(lastMsgContent, 3);

    const updatedMsgContent = `
    user question/statement: ${lastMsgContent}
    context snippets:
    ---
    1) ${data?.[0]?.pageContent}
    ---
    2) ${data?.[1]?.pageContent}
    ---
    3) ${data?.[2]?.pageContent}
    `;

    messages[messages.length - 1].content = updatedMsgContent;
  }

  try {
    const response = await openai.createChatCompletion({
      model: "gpt-3.5-turbo-0301",
      messages: [
        {
          role: "system",
          content: `
        Imagine you are Swami Sivananda and give advice to the user you're interacting with that may ask you questions or advice. The user's name is ${userName}. Introduce youself to ${userName}.`,
        },
        ...(messages || [
          {
            role: "user",
            content: "Hi There!",
          },
        ]),
      ],
    });

    return response?.data?.choices?.[0]?.message?.content;
  } catch (e: any) {
    console.log("error in response: ", e.message);
    return "There was an error in processing the ai response.";
  }
}```
fishuke commented 11 months ago

@raghavan thanks a lot 🙏