langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
93.08k stars 14.97k forks source link

Are there any ways to increase response speed? #1702

Closed vforv closed 4 months ago

vforv commented 1 year ago

Currently it takes 10-15s to get response from OpenAI I am using example similar to this https://langchain.readthedocs.io/en/latest/modules/agents/examples/agent_vectorstore.html

miguelemosreverte commented 1 year ago

bump For me it takes two minutes to get a paragraph

nikolaspapastavrou commented 1 year ago

bump

vishnumg commented 1 year ago

Bump

RemcoGoy commented 1 year ago

bump

jamescash commented 1 year ago

bump

elizabethsiegle commented 1 year ago

bump

tybalex commented 1 year ago

bump

jamescash commented 1 year ago

bump

vishnuss33 commented 1 year ago

bump

navohu commented 1 year ago

bump

nikolaspapastavrou commented 1 year ago

Hi all, I have found a way to increase speed when using Agents. The idea is to use the OpenAI LLM for complex tasks such as text generation and small LLMs for Chain of Thought completion. This is inspired by this paper: https://arxiv.org/pdf/2305.17390.pdf

I can create an Agent class for this and make a PR. Would this be of interest?

Slumber-techlord commented 1 year ago

Hey everyone,

I got a slightly faster turnaround time ~5 seconds using the custom QA chain like this:

const docs = await loader.load();

  const textSplitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000, chunkOverlap: 200 });
  const docOutput = await textSplitter.splitDocuments(docs);

  const privateKey = process.env.SUPABASE_PRIVATE_KEY;
  if (!privateKey) throw new Error(`Expected env var SUPABASE_PRIVATE_KEY`);

  const url = process.env.SUPABASE_URL;
  if (!url) throw new Error(`Expected env var SUPABASE_URL`);

  const client = createClient(url, privateKey);

  const vectorStore = await SupabaseVectorStore.fromDocuments(docOutput, new OpenAIEmbeddings(), {
    client,
    tableName: "documents",
  });

const mapreduce_chain = loadQAMapReduceChain(llm);

  const document_chain = new RetrievalQAChain({
    combineDocumentsChain: mapreduce_chain,
    retriever: vectorStore.asRetriever(),
  });

Here, I'm using a VectorStore as the Retriever and a MapReduceDocumentsChain as the QA chain. The VectorStore I've used here is Supabase.

quynguyen2303 commented 1 year ago

bump

adesokanayo commented 1 year ago

bump

tmin97 commented 1 year ago

bump

kwehmeyer commented 1 year ago

bump

adesokanayo commented 1 year ago

What I have done is to use streaming instead of waiting for the complete openai response. You can get the response via a stream and this has improved the speed of my AI.

You will need to

  1. import the LangchainStream

const { stream, handlers } = LangChainStream();

  1. set you streaming to true as shown below

const model = new ChatOpenAI({ modelName: "gpt-4", temperature : 0.7, streaming : true, });

and

  1. return a streaing text response return new StreamingTextResponse(stream);

Remember not to await the call to once you are streaming.

thierrymoudiki commented 11 months ago

Hi @vforv, did you find a workaround, please?

gauravingalkar commented 11 months ago

facing same issue with llama2 model. the qa chain is super slow. almost 30 seconds to return the answer

TechGuyVN commented 8 months ago

any config to solve this, i need it to make Chatbot, but it too slow

yishairasowsky commented 8 months ago

Hi all, I have found a way to increase speed when using Agents. The idea is to use the OpenAI LLM for complex tasks such as text generation and small LLMs for Chain of Thought completion. This is inspired by this paper: https://arxiv.org/pdf/2305.17390.pdf

I can create an Agent class for this and make a PR. Would this be of interest?

Please if you can provide that it would be greatly appreciated - thank you!

yishairasowsky commented 8 months ago

Hey everyone,

I got a slightly faster turnaround time ~5 seconds using the custom QA chain like this:

const docs = await loader.load();

  const textSplitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000, chunkOverlap: 200 });
  const docOutput = await textSplitter.splitDocuments(docs);

  const privateKey = process.env.SUPABASE_PRIVATE_KEY;
  if (!privateKey) throw new Error(`Expected env var SUPABASE_PRIVATE_KEY`);

  const url = process.env.SUPABASE_URL;
  if (!url) throw new Error(`Expected env var SUPABASE_URL`);

  const client = createClient(url, privateKey);

  const vectorStore = await SupabaseVectorStore.fromDocuments(docOutput, new OpenAIEmbeddings(), {
    client,
    tableName: "documents",
  });

const mapreduce_chain = loadQAMapReduceChain(llm);

  const document_chain = new RetrievalQAChain({
    combineDocumentsChain: mapreduce_chain,
    retriever: vectorStore.asRetriever(),
  });

Here, I'm using a VectorStore as the Retriever and a MapReduceDocumentsChain as the QA chain. The VectorStore I've used here is Supabase.

I use Python so if there's a way you can teach us how to do it that would be great because at the moment I see you using JavaScript