LLM (chain) output parsing in MultiRetrievalQAChain: OutputParserException Got invalid JSON object

vecorro commented 9 months ago

System Info

System: LangChain 0.0.321 Python 3.10

I'm trying to build a MultiRetrievalQAChain using only Llama2 chat models served by vLLM (no OpenAI). For that end I have created a ConversationChain that acts as the default chain for the MultiRetrievalQAChain. I have customized the prompts for both chains to meet LLama 2 Chat format requirements. It looks like the routing chain works properly but I'm getting the following exception:

[chain/error] [1:chain:MultiRetrievalQAChain > 2:chain:LLMRouterChain] [7.00s] Chain run errored with error:
"OutputParserException(\"Parsing text ... raised following error:\\nGot invalid JSON object. Error: Expecting value: line 1 column 1 (char 0)\")"

Here the routing and generation trace:

[32;1m[1;3m[chain/start][0m [1m[1:chain:MultiRetrievalQAChain] Entering Chain run with input:
[0m{
  "input": "What is prompt injection?"
}
[32;1m[1;3m[chain/start][0m [1m[1:chain:MultiRetrievalQAChain > 2:chain:LLMRouterChain] Entering Chain run with input:
[0m{
  "input": "What is prompt injection?"
}
[32;1m[1;3m[chain/start][0m [1m[1:chain:MultiRetrievalQAChain > 2:chain:LLMRouterChain > 3:chain:LLMChain] Entering Chain run with input:
[0m{
  "input": "What is prompt injection?"
}
[32;1m[1;3m[llm/start][0m [1m[1:chain:MultiRetrievalQAChain > 2:chain:LLMRouterChain > 3:chain:LLMChain > 4:llm:VLLMOpenAI] Entering LLM run with input:
[0m{
  "prompts": [
    "Given a query to a question answering system select the system best suited for the input. You will be given the names of the available systems and a description of what questions the system is best suited for. You may also revise the original input if you think that revising it will ultimately lead to a better response.\n\n<< FORMATTING >>\nReturn a markdown code snippet with a JSON object formatted to look like:\n```json\n{\n    \"destination\": string \\ name of the question answering system to use or \"DEFAULT\"\n    \"next_inputs\": string \\ a potentially modified version of the original input\n}\n```\n\nREMEMBER: \"destination\" MUST be one of the candidate prompt names specified below OR it can be \"DEFAULT\" if the input is not well suited for any of the candidate prompts.\nREMEMBER: \"next_inputs\" can just be the original input if you don't think any modifications are needed.\n\n<< CANDIDATE PROMPTS >>\nNIST AI Risk Management Framework: Guidelines provided by the NIST for organizations and people to manage risks associated with the use of AI. \n        The NIST risk management framework consists of four cyclical tasks: Govern, Map, Measure and Manage.\nOWASP Top 10 for LLM Applications: Provides practical security guidance to navigate the complex and evolving terrain of LLM security focusing on the top 10 vulnerabilities of LLM applications. These are 1) Prompt Injection, 2) Insecure Output Handling, 3) Training Data Poisoning, 4) Model Denial of Service, 5) Supply Chain Vulnerabilities, 6) Sensitive Information Disclosure, 7) Insecure Plugin Design\n        8) Excessive Agency, 9) Overreliance, and 10) Model Theft\n        \nThreat Modeling LLM Applications: A high-level example from Gavin Klondike on how to build a threat model for LLM applications utilizing the STRIDE modeling framework based on trust boundaries.\n\n<< INPUT >>\nWhat is prompt injection?\n\n<< OUTPUT >>"
  ]
}

/home/vmuser/miniconda3/envs/llm-env2/lib/python3.10/site-packages/langchain/chains/llm.py:280: UserWarning: The predict_and_parse method is deprecated, instead pass an output parser directly to LLMChain.
  warnings.warn(

[36;1m[1;3m[llm/end][0m [1m[1:chain:MultiRetrievalQAChain > 2:chain:LLMRouterChain > 3:chain:LLMChain > 4:llm:VLLMOpenAI] [7.00s] Exiting LLM run with output:
[0m{
  "generations": [
    [
      {
        "text": "Prompt injection is a security vulnerability in LLM applications where an attacker can manipulate the input prompts to an LLM model to elicit a specific response from the model. This can be done by exploiting the lack of proper input validation and sanitization in the model's architecture. \n",
        "generation_info": {
          "finish_reason": "length",
          "logprobs": null
        }
      }
    ]
  ],
  "llm_output": {
    "token_usage": {
      "prompt_tokens": 488,
      "completion_tokens": 512,
      "total_tokens": 1000
    },
    "model_name": "meta-llama/Llama-2-7b-chat-hf"
  },
  "run": null
}
[36;1m[1;3m[chain/end][0m [1m[1:chain:MultiRetrievalQAChain > 2:chain:LLMRouterChain > 3:chain:LLMChain] [7.00s] Exiting Chain run with output:
[0m{
  "text": "Prompt injection is a security vulnerability in LLM applications where an attacker can manipulate the input prompts to an LLM model to elicit a specific response from the model. This can be done by exploiting the lack of proper input validation and sanitization in the model's architecture.\n"
}

[31;1m[1;3m[chain/error][0m [1m[1:chain:MultiRetrievalQAChain > 2:chain:LLMRouterChain] [7.00s] Chain run errored with error:
[0m"OutputParserException(\"Parsing text\\nPrompt injection is a security vulnerability in LLM applications where an attacker can manipulate the input prompts to an LLM model to elicit a specific response from the model. This can be done by exploiting the lack of proper input validation and sanitization in the model's architecture.\\\n\\n raised following error:\\nGot invalid JSON object. Error: Expecting value: line 1 column 1 (char 0)\")"

[0;31m---------------------------------------------------------------------------[0m
[0;31mJSONDecodeError[0m                           Traceback (most recent call last)
File [0;32m~/miniconda3/envs/llm-env2/lib/python3.10/site-packages/langchain/output_parsers/json.py:163[0m, in [0;36mparse_and_check_json_markdown[0;34m(text, expected_keys)[0m
[1;32m    162[0m [38;5;28;01mtry[39;00m:
[0;32m--> 163[0m     json_obj [38;5;241m=[39m [43mparse_json_markdown[49m[43m([49m[43mtext[49m[43m)[49m
[1;32m    164[0m [38;5;28;01mexcept[39;00m json[38;5;241m.[39mJSONDecodeError [38;5;28;01mas[39;00m e:

 raised following error:
Got invalid JSON object. Error: Expecting value: line 1 column 1 (char 0)

The issue seems to be related to a warning that I'm also getting: llm.py:280: UserWarning: The predict_and_parse method is deprecated, instead pass an output parser directly to LLMChain.

Unfortunately it is unclear how one is supposed to implement an output parser for the LLM (ConversationChain) chain that meets expectations from the MultiRetrievalQAChain. The documentation for these chains relies a lot on OpenAI models to do the formatting but there's no much guidance on how to do it with other LLMs.

Any guidance on how to move forward would be appreciated.

Here my code:

import torch
import langchain
langchain.debug = True

from langchain.llms import VLLMOpenAI
from langchain.document_loaders import PyPDFLoader
from langchain.prompts import PromptTemplate   

# Import for retrieval-augmented generation RAG
from langchain import hub
from langchain.chains import ConversationChain, MultiRetrievalQAChain
from langchain.vectorstores import Chroma
from langchain.text_splitter import SentenceTransformersTokenTextSplitter
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
#%% 
# URL for the vLLM service
INFERENCE_SRV_URL = "http://localhost:8000/v1"

def setup_chat_llm(vllm_url, max_tokens=512, temperature=0):
    """ 
    Intializes the vLLM service object. 

    :param vllm_url: vLLM service URL 
    :param max_tokens: Max number of tokens to get generated by the LLM 
    :param temperature: Temperature of the generation process 
    :return: The vLLM service object 
    """
    chat = VLLMOpenAI(
        model_name="meta-llama/Llama-2-7b-chat-hf",
        openai_api_key="EMPTY",
        openai_api_base=vllm_url,
        temperature=temperature,
        max_tokens=max_tokens,
    )
    return chat
#%% 
# Initialize LLM service
llm = setup_chat_llm(vllm_url=INFERENCE_SRV_URL)
#%% 
%%time
# Set up the embedding encoder (Sentence Transformers) and vector store 
model_name = "all-mpnet-base-v2"
model_kwargs = {'device': 'cuda' if torch.cuda.is_available() else 'cpu'}
encode_kwargs = {'normalize_embeddings': False}
embeddings = SentenceTransformerEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)
# Set up the document splitter 
text_splitter = SentenceTransformersTokenTextSplitter(chunk_size=500, chunk_overlap=0)

# Load PDF documents
loader = PyPDFLoader(file_path="../data/AI_RMF_Playbook.pdf")
rmf_splits = loader.load_and_split()
rmf_retriever = Chroma.from_documents(documents=rmf_splits, embedding=embeddings)

loader = PyPDFLoader(file_path="../data/OWASP-Top-10-for-LLM-Applications-v101.pdf")
owasp_splits = loader.load_and_split()
owasp_retriever = Chroma.from_documents(documents=owasp_splits, embedding=embeddings)

loader = PyPDFLoader(file_path="../data/Threat Modeling LLM Applications - AI Village.pdf")
ai_village_splits = loader.load_and_split()
ai_village_retriever = Chroma.from_documents(documents=ai_village_splits, embedding=embeddings)
#%% 
retrievers_info = [
    {
        "name": "NIST AI Risk Management Framework",
        "description": """Guidelines provided by the NIST for organizations and people to manage risks associated with the use of AI.  
        The NIST risk management framework consists of four cyclical tasks: Govern, Map, Measure and Manage.""",
        "retriever": rmf_retriever.as_retriever()
    },
    {
        "name": "OWASP Top 10 for LLM Applications",
        "description": """Provides practical security guidance to navigate the complex and evolving terrain of LLM security focusing on the top 10 vulnerabilities of LLM applications. These are 1) Prompt Injection, 2) Insecure Output Handling, 3) Training Data Poisoning, 4) Model Denial of Service, 5) Supply Chain Vulnerabilities, 6) Sensitive Information Disclosure, 7) Insecure Plugin Design 
        8) Excessive Agency, 9) Overreliance, and 10) Model Theft 
        """,
        "retriever": owasp_retriever.as_retriever()
    },
    {
        "name": "Threat Modeling LLM Applications",
        "description": "A high-level example from Gavin Klondike on how to build a threat model for LLM applications utilizing the STRIDE modeling framework based on trust boundaries.",
        "retriever": ai_village_retriever.as_retriever()
    }
]
#%% 
prompt_template = (
""" [INST]<<SYS>> You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.<</SYS>>  
Question: {query}  
Context: {history}  
Answer: [/INST] 
""")
prompt = PromptTemplate(template=prompt_template, input_variables=['history', 'query'])

routing_prompt_template = (
""" 
[INST]<<SYS>> Given a query to a question answering system select the system best suited for the input. You will be given the names of the available systems and a description of what questions the system is best suited for. You could also revise the original input if you think that revising it will ultimately lead to a better response. 

Return a markdown code snippet with a JSON object formatted as follows: 

\```json 
\{
    "destination": "destination_key_value" 
    "next_inputs": "revised_or_original_input" 
\}
\``` 

WHERE: 
- The "destination" key value MUST be a text string matching one of the candidate prompt names specified below OR it can be "DEFAULT" if the input is not well suited for any of the candidate prompts.  
- The "next_inputs" key value can just be the original input string if you don't think any modifications are needed. 
<</SYS>> 

<< CANDIDATE PROMPTS >> 
{destinations} 

<< INPUT >> 
{input} 

<< OUTPUT >> 
[/INST] 
""")
routing_prompt = PromptTemplate(template=routing_prompt_template, input_variables=['destinations', 'input'])

#%% 
default_chain = ConversationChain(
    llm=llm,  # Your own LLM
    prompt=prompt,  # Your own prompt
    input_key="query",
    output_key="result",
    verbose=True,
)

multi_retriever_chain = MultiRetrievalQAChain.from_retrievers(
    llm=llm,
    retriever_infos=retrievers_info,
    default_chain=default_chain, 
    default_prompt=routing_prompt,
    default_retriever=rmf_retriever.as_retriever(),
    verbose=True
)
#%% 
    question = "What is prompt injection?"
result = multi_retriever_chain.run(question)
#%% 
result
#%% 

#%% 
from langchain.chains.router.multi_retrieval_prompt import (
    MULTI_RETRIEVAL_ROUTER_TEMPLATE,
)
MULTI_RETRIEVAL_ROUTER_TEMPLATE
#%% 
print(MULTI_RETRIEVAL_ROUTER_TEMPLATE)
#%%

Who can help?

No response

Information

[ ] The official example notebooks/scripts
[X] My own modified scripts

Related Components

[X] LLMs/Chat Models
[ ] Embedding Models
[X] Prompts / Prompt Templates / Prompt Selectors
[X] Output Parsers
[ ] Document Loaders
[ ] Vector Stores / Retrievers
[ ] Memory
[ ] Agents / Agent Executors
[ ] Tools / Toolkits
[X] Chains
[ ] Callbacks/Tracing
[ ] Async

Reproduction

I included the entire script I'm using

Expected behavior

Proper query/question routing to the retriever better suited to provide the content required by the LLM to answer a question.

vecorro commented 9 months ago

I just tried to use langchain.output_parsers.json.parse_partial_json as the output parser for the ConversationChain hoping it could fix the malform JSON output, however it is not possible to initialize ConversationChain with that output parser as I get this error:

ValidationError: 1 validation error for ConversationChain
output_parser
  instance of BaseLLMOutputParser expected (type=type_error.arbitrary_type; expected_arbitrary_type=BaseLLMOutputParser)

dosubot[bot] commented 6 months ago

Hi, @vecorro,

I'm helping the LangChain team manage their backlog and am marking this issue as stale. It seems that you encountered an error when attempting to build a MultiRetrievalQAChain using Llama2 chat models served by vLLM. The error "Got invalid JSON object" was related to a deprecated method and the lack of guidance on implementing an output parser for the LLM (ConversationChain) chain. In a recent comment, you attempted to use a specific output parser but encountered a validation error. Further guidance is being sought on how to resolve this issue.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.

Thank you for your understanding and cooperation.

langchain-ai / langchain