langchain-ai / langchainjs

🦜🔗 Build context-aware reasoning applications 🦜🔗
https://js.langchain.com/docs/
MIT License
12.53k stars 2.14k forks source link

Assistants+tools=[{“type”: “retrieval”}], What is the retrieval principle? Give pdf directly to AI? How to count tokens? #3358

Closed lishaozheng closed 7 months ago

lishaozheng commented 11 months ago

client = openai.OpenAI()

file = client.files.create(file=open(“dlof.pdf”, “rb”),purpose=‘assistants’)

assistant = client.beta.assistants.create( name=“Draft Letter Of Offer Assistant”, instructions=“You are a merger and takeover specialist chatbot. Use your knowledge base to best respond to queries related to mergers and takeovers. Pls be precise”, model=“gpt-3.5-turbo-1106”, tools=[{“type”: “retrieval”}], file_ids=[file.id] )

What is the retrieval principle? Give pdf directly to AI? How to count tokens. If it is a retrieval, it should only be a part of the gpt, that prompt is how, there is a way to output.

dosubot[bot] commented 11 months ago

🤖

Hello @lishaozheng,

I'm an AI bot here to help you while we wait for a human maintainer to join us. I can assist with solving bugs, answering questions, and guiding you on how to contribute to this project. Now, let's take a closer look at your issue. I'll provide you with a full answer shortly.

dosubot[bot] commented 11 months ago

🤖

Hello @lishaozheng,

Nice to meet you! I'm a bot here to help you with any questions, bugs, or contributions related to the langchainjs repository while we wait for a human maintainer. I'm equipped to assist you in various ways, so don't hesitate to reach out if you need help. Looking forward to assisting you!

The retrieval principle in AI, specifically in the context of LangChain, refers to the process of finding and providing the most relevant information in response to a query. This is done using the RetrievalQAChain class in LangChain. This class uses a language model and a vector store to find the most relevant documents in response to a query. You can find more details about this in the langchain/src/chains/retrieval_qa.ts file in the repository.

To provide a PDF directly to an AI, you would first need to convert the PDF to a format that the AI can understand. In the case of LangChain, this would be a Document object. You would need to extract the text from the PDF, and then create a new Document with that text. You can use the PDFLoader class from the langchain/document_loaders/fs/pdf module to do this. Here's an example:

import { PDFLoader } from "langchain/document_loaders/fs/pdf";

export const run = async () => {
  // Create a new PDFLoader instance
  const loader = new PDFLoader("src/document_loaders/example_data/bitcoin.pdf");

  // Load the PDF and convert it to Document objects
  const docs = await loader.load();

  console.log({ docs });
};

In this example, the run function converts the PDF file at src/document_loaders/example_data/bitcoin.pdf to Document objects and logs them to the console. You can find more details about this in the examples/src/document_loaders/pdf.ts file in the repository.

Counting tokens in LangChain can be done using the countTokens method of the Document class. However, this method is not shown in the provided context. Here's a hypothetical example of how you might count the tokens in a Document:

import { Document } from "langchain/document";
import { PDFDocument } from "pdf-lib";

async function pdfToDocument(pdfBuffer: Buffer) {
  const pdfDoc = await PDFDocument.load(pdfBuffer);
  const pages = pdfDoc.getPages();
  const text = pages.map(page => page.getTextContent()).join("\n");
  return new Document({ pageContent: text });
}

const pdfBuffer = fs.readFileSync("path/to/your/pdf");
const doc = await pdfToDocument(pdfBuffer);
console.log(`The document contains ${doc.countTokens()} tokens.`);

Please note that this is a hypothetical example and may not work as-is. The actual implementation would depend on the specifics of your use case and the libraries you are using.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dosubot[bot] commented 8 months ago

Hi, @lishaozheng

I'm helping the langchainjs team manage their backlog and am marking this issue as stale. From what I understand, you were inquiring about the retrieval principle in OpenAI's assistant tools, the possibility of providing a PDF directly to the AI, and how to count tokens. I provided a detailed response explaining the retrieval principle in LangChain, how to provide a PDF to the AI, and how to count tokens using hypothetical examples and references to specific files in the repository.

Could you please confirm if this issue is still relevant to the latest version of the langchainjs repository? If it is, please let the langchainjs team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you!