run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.72k stars 5.27k forks source link

[Bug]: Knowledge Graph is not working #7470

Closed vishnu9000 closed 1 year ago

vishnu9000 commented 1 year ago

Bug Description

I was trying to test out the example of KG shown in llama index documents. I was trying out the default normal KG where triplets are created by llm itself and second method where triplets are added manually. In both scenario following are the issues I have seen:

  1. Most of the times I am getting following error: RecursionError: maximum recursion depth exceeded while calling a Python object

  2. When displaying the graph it not showing currently, following is the graph I am getting

image

I was trying to add manually the triplets like this:

add keyword mappings and nodes manually

add triplets (subject, relationship, object)

for node 0

node_0_tups = [ ("Test case", "have id", "TCOTT001"), ("Test case", "have name", "Profile Image Upload Verification"), ("Test case", "following are the steps for testing the test case", "1.Precondition:User is logged in to the OTT platform with a valid account.2.Navigate to the user's profile section.3.Click on the \"Edit Profile\" or \"Change Profile Picture\" option.4.Choose an image file from the local system.5.Click on the \"Upload\" or \"Save Changes\" button."), ("Test case", "following are the expected results after testing the test case", "1.The selected image should be successfully uploaded.2.The user's profile picture should be updated to the newly uploaded image.3.The platform should display a success message confirming the profile image update."), ] for tup in node_0_tups: index.upsert_triplet_and_node(tup, nodes[0])

Same issue has been seen when llm extracts triplets and create the graph.

Am I doing something wrong or is this a bug, because most times I am getting either "I don't know" or RecursionError: maximum recursion depth exceeded while calling a Python object for any questions.

I am working on an use case for document question answering and I am confused on which path to take. I am lot of excel files that contain test case details like component, test case id, test case name, test case steps, expected results. I want to create a QA system such that it should be able to answer following questions:

  1. Is this bug covered by any test case?
  2. Is there any similar test case related to this?
  3. List me all test case that need to be tested for a component.
  4. Give me list of minimum test cases that need to be tested when change x happens?

Here first 2 questions can be done with simple vector store. But last two cannot be achieved by vector store due to limitations of vector store RAG where chunks are save. Let say I said give me test case for y component. The pipeline will retrieve 5 best matched chunk. But I may have 20 such test cases. And for last question the model might need to go to multiple chunks to get the accurate results. In my case for each test case component, test case id, test case name, test case steps, expected results is a chunk so itself will be having a good amount of size.

That when I thought of Knowledge Graph and looked into llama index framework. I am using local llama 2 quantized model for security reasons of data.

I don't know if I am going in right direction. Any help would be appreciated.

Version

Latest Version

Steps to Reproduce

from llama_index import SimpleDirectoryReader, KnowledgeGraphIndex from llama_index import ServiceContext from llama_index.llms import OpenAI

documents= SimpleDirectoryReader(input_files=["/content/test.txt"]).load_data() llm = OpenAI(temperature=0, model="text-davinci-002") service_context = ServiceContext.from_defaults(llm=llm, chunk_size=512)

from llama_index.storage.storage_context import StorageContext from llama_index.graph_stores import SimpleGraphStore

graph_store = SimpleGraphStore() storage_context = StorageContext.from_defaults(graph_store=graph_store)

NOTE: can take a while!

index = KnowledgeGraphIndex.from_documents( documents, max_triplets_per_chunk=50, storage_context=storage_context, service_context=service_context, )

Following is the graph I am getting: image

The input test is following: Donald Cooper observes a pattern formed by an apparent gravitational anomaly in Murphs bedroom. He decodes it into GPS coordinates and arrives at a secret NASA facility headed by Professor Brand. Brand explains that NASA is trying to find an exoplanet capable of supporting life, and he is working on solving a gravity equation to provide a way of transporting large numbers of people off the dying Earth ("Plan A").He enlists Cooper to pilot an exploratory spacecraft called the Endurance, holding the supplies and embryos for a new colony potentially without the population of Earth ("Plan B"), with a crew of three scientists: Romilly, Doyle, and Brand’s daughter Amelia, accompanied by robot assistants TARS and CASE. With few resources to mount more expeditions, this is humanitys last chance to travel through a wormhole that mysteriously appeared near Saturn to a series of potentially habitable planets.

And If I ask any question from this paragraph it will not given any answer.

Relevant Logs/Tracbacks

No response

dosubot[bot] commented 1 year ago

🤖

Hi there,

I'm sorry to hear that you're experiencing issues with the Knowledge Graph feature of LlamaIndex. Let's try to troubleshoot this together.

Firstly, regarding the RecursionError, it's possible that the recursion limit of Python is being exceeded due to the complexity of the graph. This could be due to a large number of nodes or edges, or a complex structure of the graph. However, without more specific information, it's hard to pinpoint the exact cause. Could you provide more details about the size and structure of your graph?

Secondly, regarding the graph not displaying correctly, I'm unable to view the images you've posted as I can't process images. Could you describe the issue in more detail? Are there any error messages being displayed?

As for the questions you're trying to answer with your QA system, it seems like you're on the right track with using a Knowledge Graph. However, the complexity of the questions might require a more sophisticated approach than just a simple vector store. You might need to consider using a combination of techniques, such as semantic search, natural language processing, and machine learning, to achieve your goals.

Lastly, could you provide the exact version of LlamaIndex you're using? You mentioned you're using the "latest version", but it would be helpful to know the exact version number.

Looking forward to your response.


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.