Issue: [Azure] Summary chain with chat 3.5 turbo - Not being parallelized

jaimeescano commented 1 year ago

Issue you'd like to raise.

Hi,

Not sure if someone is facting this "issue" or is something wrong I'm doing. So far .. I read, GPT 3.5 turbo and later should be used with "chat_models" instead of "models". While testing the "summary" chaing (map_reduce). I noticed that using "model" llm it does indeed run in parallel, but using chat_model does run in sequence.

From the src in langchain .. I saw:

[langchain][chains][combine_documents] map_reduce.py

map_results = await self.llm_chain.aapply(

FYI - this is parallelized and so it is fast.

[{{self.document_variable_name: d.page_content}, kwargs} for d in docs], callbacks=callbacks, )

And tracing the execution down AzureOpenAI chat_model: it will execute a for loop and wait for the response. Multiple API calls to the endpoint

results.append( self._generate_with_cache( m, stop=stop, run_manager=run_managers[i] if run_managers else None, **kwargs, ) )

AzureOpenAI model: (aka completion) ... it generates a single call with all the prompts.

response = completion_with_retry(self, prompt=_prompts, **params)

And here are my outcomes:

Using ChatModel (azure) ... it works as expected, following the prompt and creating the expected output, but in a sequential execution.
Using LLM Model (azure - aka completion) ... it runs in parallel, but the "summaries" are not correct, it "creates ramdon content" not related to the topic (I have set the temperature to 0 and top_p 0.9) ... but still does not "creates" summaries of provided text.

So my question/concerns are:

Is summarization chain expected to run in parallel mode with chatModel LLMs? If so ... can anyone provide a sample .. can't make it to work in parallel.
Is "completion llms" (aka normal llm model) only good for "generating content" but not for "summaries"? Using gpt 3.5 turbo?

Thanks for your help in advace.

Suggestion:

No response

dosubot[bot] commented 1 year ago

Answer generated by a 🤖

Answer

I found related issues that might be helpful. I did my best to summarize the solutions, but I recommend looking at the related issues yourself.

Closed Issues

Summary chain with chat 3.5 turbo

The suggested solution is:

First, to create a summary of a long document using the chat version of GPT-3.5 Turbo, you can use the following code:

def summary(docs:List[Document]):
    from langchain.chat_models import ChatOpenAI

    llm = ChatOpenAI(max_tokens=1500, temperature=0.3, model_name="gpt-3.5-turbo")
    prompt_template = """Write a summary of the following:

    {text}

    SUMMARY IN ENGLISH:"""
    PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
    chain = load_summarize_chain(
        llm, chain_type="map_reduce", map_prompt=PROMPT, combine_prompt=PROMPT
    )
    summary = chain.run(docs)
    return summary

Second, to add a SystemMessage at the beginning, you can use the following code:

template = "You are a financial analyst that make summary of presentation and meetings."
system_message_prompt = SystemMessagePromptTemplate.from_template(template)
human_template="{text}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)
prompt = ChatPromptTemplate.from_messages(
    [system_message_prompt, human_message_prompt],
)
chain = load_summarize_chain(llm, chain_type="map_reduce", return_intermediate_steps=True, map_prompt=prompt, combine_prompt=prompt)
chain({"input_documents": docs}, return_only_outputs=True)

ChatGPT's API model, gpt-3.5-turbo, doesn't appear to work for summarization tasks

The suggested solution is:

Switch from OpenAI to ChatOpenAI. Instead of importing

from langchain import OpenAI
llm = OpenAI(model_name=model_name, temperature=temperature)

import

from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(model_name=model_name, temperature=temperature)

The following two PRs helped https://github.com/hwchase17/langchain/pull/1463 and https://github.com/hwchase17/langchain/issues/1652.

This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

dosubot[bot] commented 1 year ago

Hi, @jaimeescano! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, the issue you raised is about the parallelization of the summary chain with chat 3.5 turbo in Azure. You mentioned that using the chat_model runs in sequence, while using the LLM model runs in parallel but generates incorrect summaries. In the comments, there are suggestions to use the ChatOpenAI model instead of OpenAI and to add a SystemMessage at the beginning.

I wanted to check with you if this issue is still relevant to the latest version of the LangChain repository. If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.

Thank you for your contribution to the LangChain repository! Let me know if there's anything else I can assist you with.

langchain-ai / langchain