langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
90.16k stars 14.26k forks source link

KeyError: 'tail_type' when using LLMGraphTransformer #22061

Open aditya-kf opened 2 months ago

aditya-kf commented 2 months ago

Checked other resources

Example Code

from langchain.chat_models import AzureChatOpenAI from langchain_core.documents import Document from langchain_experimental.graph_transformers import LLMGraphTransformer

llm = AzureChatOpenAI( deployment_name=deployment_name, model_name='gpt-35-turbo', temperature=0, openai_api_base = api_base, openai_api_type = api_type, openai_api_key = api_key, openai_api_version = api_version )

docs = [] # list of LangChain documents

page_contents -> list of strings

for document in page_contents: docs.append(Document(page_content=document))

llm_transformer = LLMGraphTransformer(llm=llm)

graph_documents = llm_transformer.convert_to_graph_documents(docs)

Error Message and Stack Trace (if applicable)

KeyError Traceback (most recent call last) File :3 1 llm_transformer = LLMGraphTransformer(llm=llm) ----> 3 graph_documents = llm_transformer.convert_to_graph_documents(docs)

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-5273191c-fbe4-4f45-837a-b17c967f70ce/lib/python3.10/site-packages/langchain_experimental/graph_transformers/llm.py:646, in LLMGraphTransformer.convert_to_graph_documents(self, documents) 634 def convert_to_graph_documents( 635 self, documents: Sequence[Document] 636 ) -> List[GraphDocument]: 637 """Convert a sequence of documents into graph documents. 638 639 Args: (...) 644 Sequence[GraphDocument]: The transformed documents as graphs. 645 """ --> 646 return [self.process_response(document) for document in documents]

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-5273191c-fbe4-4f45-837a-b17c967f70ce/lib/python3.10/site-packages/langchain_experimental/graph_transformers/llm.py:646, in (.0) 634 def convert_to_graph_documents( 635 self, documents: Sequence[Document] 636 ) -> List[GraphDocument]: 637 """Convert a sequence of documents into graph documents. 638 639 Args: (...) 644 Sequence[GraphDocument]: The transformed documents as graphs. 645 """ --> 646 return [self.process_response(document) for document in documents]

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-5273191c-fbe4-4f45-837a-b17c967f70ce/lib/python3.10/site-packages/langchain_experimental/graph_transformers/llm.py:599, in LLMGraphTransformer.process_response(self, document) 596 for rel in parsed_json: 597 # Nodes need to be deduplicated using a set 598 nodes_set.add((rel["head"], rel["head_type"])) --> 599 nodes_set.add((rel["tail"], rel["tail_type"])) 601 source_node = Node(id=rel["head"], type=rel["head_type"]) 602 target_node = Node(id=rel["tail"], type=rel["tail_type"])

KeyError: 'tail_type'

Description

I am trying to convert LangChain documents to Graph Documents using the 'convert_to_graph_documents' function from 'LLMGraphTransformer'. I am using the 'gpt-35-turbo' model from AzureChatOpenAI.

System Info

System Information OS: Linux OS Version: https://github.com/langchain-ai/langchain/pull/70~20.04.1-Ubuntu SMP Mon Apr 8 15:38:58 UTC 2024 Python Version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]

Package Information langchain_core: 0.2.0 langchain: 0.2.0 langchain_community: 0.2.0 langsmith: 0.1.60 langchain_experimental: 0.0.59 langchain_groq: 0.1.4 langchain_openai: 0.1.7 langchain_text_splitters: 0.2.0

Packages not installed (Not Necessarily a Problem) The following packages were not found:

langgraph langserve

m-revetria commented 4 weeks ago

Hi, I'm getting an error which looks similar and might be related:

File "graph.py", line 21, in save_documents
  graph_documents = llm_transformer.convert_to_graph_documents(chunks)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.11/site-packages/langchain_experimental/graph_transformers/llm.py", line 762, in convert_to_graph_documents
  return [self.process_response(document) for document in documents]
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.11/site-packages/langchain_experimental/graph_transformers/llm.py", line 762, in <listcomp>
  return [self.process_response(document) for document in documents]
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.11/site-packages/langchain_experimental/graph_transformers/llm.py", line 714, in process_response
  nodes_set.add((rel["head"], rel["head_type"]))
                   ~~~^^^^^^^^
TypeError: list indices must be integers or slices, not str

When the error is raised,rel is an array instead of a dict, its value is similar to this:

[
  {'head': 'The entity', 'head_type': 'Event', 'relation': 'RECEIVED_BY', 'tail': 'Person 1', 'tail_type': 'Person'}, 
  {'head': 'The entity', 'head_type': 'Event', 'relation': 'RECEIVED_BY', 'tail': 'Person 2', 'tail_type': 'Person'},
  {'head': 'The entity', 'head_type': 'Event', 'relation': 'RECEIVED_BY', 'tail': 'Person 3', 'tail_type': 'Person'},
  {'head': 'The entity', 'head_type': 'Event', 'relation': 'RECEIVED_BY', 'tail': 'Person 4', 'tail_type': 'Person'},
  {'head': 'The entity', 'head_type': 'Event', 'relation': 'SUBMITTED_THROUGH', 'tail': 'signed document', 'tail_type': 'Method'}
]

I'm running this with Ollama, the code is as follow:

from langchain_community.graphs import Neo4jGraph
from langchain_community.llms.ollama import Ollama
from langchain_experimental.graph_transformers import LLMGraphTransformer

def save_documents(chunks: list["Document"]):
    llm = Ollama(model="phi3:mini")
    llm_transformer = LLMGraphTransformer(llm=llm)

    graph_documents = llm_transformer.convert_to_graph_documents(chunks)

    graph = Neo4jGraph()
    graph.add_graph_documents(
        graph_documents,
        baseEntityLabel=True,
        include_source=True,
    )
Elobo68 commented 3 weeks ago

Hi, i am getting the same error. I put an ugly try and except in the faulty part.

It send me this dict, that contains an error. {'head': 'TEXT', 'head_ type': 'Document', 'relation': 'DEFINES', 'tail': 'Interface', 'tail_type': 'Concept'}

There is a space in "head_ type", and it should not.

MOB83 commented 3 weeks ago

Seeing the same with tail_type with it adding a 0 to the output at the final element in the list. Interestingly, instead of 'tail_type' like all other entries that failing one outputs as "tail_type0:. This has an opening double quote instead of single, a zero at the end instead of a closing double quote

zdxpan commented 1 week ago

解析错误总结: 解析错误case如下: 1 每个结果中缺少一个key [{'head': 'Phoenix 7', 'head_type': 'Product', 'relation': 'HAS_SIZE', 'tail': 'regular Phoenix 7', 'tail_type': 'Product'}, ...] 2 每个结果中某一个key 画蛇添足 增加了部分空格,如 'tailtype' --> 'tail type' 比较少出现,只有网上出现过一次 3、将所有结 全部用一个 字典包起来如:{'entities': [{'head': ...}, {...} ] } LLMGraphTransformer parsed_json error, rel is entities parsed_json is {'entities': [{'head': 'Phoenix 7', 'head_type': 'Product', 'relation': 'HAS_SIZE', 'tail': 'regular Phoenix 7', 'tail_type': 'Product'}, {'head': 'Enduro 2', 'head_type': 'Product', 'relation': 'HAS_FEATURE', 'tail': 'turn-by-turn navigation', 'tail_type': 'Feature'}]} 4、有时候只返回一条结果 形如 {'head': ...} 这个时候如果遍历将会丢失,缺失数据~ LLMGraphTransformer parsed_json error, rel is tail_type parsed_json is {'head': 'Apple App Store', 'head_type': 'Service', 'relation': 'PROVIDES', 'tail': 'Apps', 'tail_type': 'Product'}

很对这几种case优化,就能很大程度减少问题!