run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.4k stars 4.99k forks source link

Can GPTKnowledgeGraphIndex insert triples directly? #743

Closed pxlsdz closed 1 year ago

pxlsdz commented 1 year ago

Can GPTKnowledgeGraphIndex insert triples directly?

jerryjliu commented 1 year ago

not at the moment, but this is a good idea!

pxlsdz commented 1 year ago

Thank you for your reply, I hope you can use this function as soon as possible, otherwise you need to call opneAI extraction every time, thank you!

jerryjliu commented 1 year ago

makes sense! shouldn't be too hard to add an insert_triplets method on the index

victorconan commented 1 year ago

I tried modifying the current graph index class to directly insert triplets. But I noticed a few problems that probably need a more considerate design for graph index query.

  1. When there are too many triplets, we need an option to limit the number of triplets we use as context. Send over 1000 triplets as context to GPT would be a disaster. It's tricky though to pinpoint the "important" triplets. Maybe allowing the edge weights and using them as the ranking metric would work?
  2. The KG is constructed as (A, predicate, B) -> A: [predicate, B]. This only allows searching entity A not searching entity B. This looks like a directed graph where you can only ask about A to get the answer about B. To make it bidirectional, I noticed I have to make a counterpart triplet (B, predicate inverse, A). Maybe another data structure could solve this problem?

I'm also curious about how to make a multi-hop graph search work under the llama index framework. Maybe a hook up with Neo4j is a better choice?

makes sense! shouldn't be too hard to add an insert_triplets method on the index

jerryjliu commented 1 year ago

I tried modifying the current graph index class to directly insert triplets. But I noticed a few problems that probably need a more considerate design for graph index query.

  1. When there are too many triplets, we need an option to limit the number of triplets we use as context. Send over 1000 triplets as context to GPT would be a disaster. It's tricky though to pinpoint the "important" triplets. Maybe allowing the edge weights and using them as the ranking metric would work?
  2. The KG is constructed as (A, predicate, B) -> A: [predicate, B]. This only allows searching entity A not searching entity B. This looks like a directed graph where you can only ask about A to get the answer about B. To make it bidirectional, I noticed I have to make a counterpart triplet (B, predicate inverse, A). Maybe another data structure could solve this problem?

I'm also curious about how to make a multi-hop graph search work under the llama index framework. Maybe a hook up with Neo4j is a better choice?

@victorconan thanks for the analysis!

  1. Yeah we need to limit the # triplets, currently they all fit within one Node
  2. you're right in that edges are unidirectional, This is something we'll investigate a bit more (not sure if we want to explicitly have bidirectional edges + the complexity of it)

hopefully will have a basic PR to insert triplets soon

jerryjliu commented 1 year ago

closed with #996