[Question]: Is it possible to search for equations stored in the NebularGraph DB?

run-llama / llama_index

LlamaIndex is a data framework for your LLM applications

https://docs.llamaindex.ai

MIT License

36.81k stars 5.28k forks source link

[Question]: Is it possible to search for equations stored in the NebularGraph DB? #9429

Closed JinSeoung-Oh closed 11 months ago

JinSeoung-Oh commented 11 months ago

Question Validation

[X] I have searched both the documentation and discord for an answer.

Question

Hi, I have a question about searching for equations stored in the NebularGraph DB. when I run print(graph_store.get('운행정보 확인 장착 이전 기간의 주행거리')), then it show

'운행정보 확인 장착 이전 기간의 주행거리{name: 운행정보 확인 장착 이전 기간의 주행거리} -[relationship:{relationship: =}]-> (운행정보 확인장치 장착 시 고지된 주행거리 - 보험거리 시 고지된 주행거리) x (운행정보 확인장치 미장착 기간/주행거리 계산기간){name: (운행정보 확인장치 장착 시 고지된 주행거리 - 보험거리 시 고지된 주행거리) x (운행정보 확인장치 미장착 기간/주행거리 계산기간)}'

It is come from 운행정보 확인 장착 이전 기간의 주행거리 = (운행정보 확인장치 장착 시 고지된 주행거리 - 보험거리 시 고지된 주행거리) x (운행정보 확인장치 미장착 기간/주행거리 계산기간)

So, if I ask about '운행정보 확인 장착 이전 기간의 주행거리는 어떻게 계산 되나요?', I would like the response to be '운행정보 확인 장착 이전 기간의 주행거리 = (운행정보 확인장치 장착 시 고지된 주행거리 - 보험거리 시 고지된 주행거리) x (운행정보 확인장치 미장착 기간/주행거리 계산기간)'

But, when I send query to RetrieverQueryEngine with KnowledgeGraphRAGRetriever, it seems like can not search for mathematical symbols like '=', '-','+','/' and 'x'

So I want to check KnowledgeGraphRAGRetriever with NebulaGraphStore cannot search for mathematical symbols or not

Thanks!

dosubot[bot] commented 11 months ago

🤖

I understand your concern about the inability to search for mathematical symbols using the KnowledgeGraphRAGRetriever with NebulaGraphStore. This could be due to the way the search engine interprets these symbols.

To better assist you, could you provide more details about the specific error message you're encountering when you try to search for these symbols? Additionally, could you share the version of the KnowledgeGraphRAGRetriever and NebulaGraphStore you're currently using? This information will help me understand the situation better and provide a more accurate solution.

Also, if you have made any modifications to the default settings or configurations of these tools, please share those details as well.

Thank you for your cooperation.

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

JinSeoung-Oh commented 11 months ago

@dosu-bot I use llama-index 0.8.56 and nebula3-python 3.4.0 And actually, it did not show error message. When I run display(Markdown(f"{response}")) I cannot find out relationship:{relationship: =}

Graph RAG context:
The following are knowledge sequence in max depth 2 in the form of directed graph like:
`subject -[predicate]->, object, <-[predicate_next_hop]-, object_next_hop ...` extracted based on key entities as subject:
장착{name: 장착} <-[relationship:{relationship: 이후}]- 운행정보 확인장치{name: 운행정보 확인장치} -[relationship:{relationship: 보험회사가 정하는 방법에 의해 알려야 함}]-> 배송지 주소{name: 배송지 주소}
장착{name: 장착} <-[relationship:{relationship: 이후}]- 운행정보 확인장치{name: 운행정보 확인장치} -[relationship:{relationship: 비용이 청구될 수 있습니다.}]-> 잔존가치{name: 잔존가치}
장착{name: 장착} <-[relationship:{relationship: 이후}]- 운행정보 확인장치{name: 운행정보 확인장치} -[relationship:{relationship: 운행정보를 측정하고 전송하는 기능을 갖춘}]-> 피보험자동차{name: 피보험자동차}
장착{name: 장착} <-[relationship:{relationship: 이후}]- 운행정보 확인장치{name: 운행정보 확인장치} -[relationship:{relationship: 하여야 합니다}]-> 반납{name: 반납}
장착{name: 장착} <-[relationship:{relationship: 이후}]- 운행정보 확인장치{name: 운행정보 확인장치} -[relationship:{relationship: 운행정보 확인장치의 교체 등으로 인하여}]-> 장착된 이후{name: 장착된 이후}
장착{name: 장착} <-[relationship:{relationship: 이후}]- 운행정보 확인장치{name: 운행정보 확인장치} -[relationship:{relationship: emits}]-> LED창{name: LED창}
장착{name: 장착} <-[relationship:{relationship: 이후}]- 운행정보 확인장치{name: 운행정보 확인장치} -[relationship:{relationship: 고장인 것을 확인하다}]-> 보험회사{name: 보험회사}
장착{name: 장착} <-[relationship:{relationship: 이후}]- 운행정보 확인장치{name: 운행정보 확인장치} -[relationship:{relationship: 반납하지 않을 경우에는}]-> 운행정보 확인장치의 잔존가치에 대한 비용 청구될 수 있다{name: 운행정보 확인장치의 잔존가치에 대한 비용 청구될 수 있다}
장착{name: 장착} <-[relationship:{relationship: 이후}]- 운행정보 확인장치{name: 운행정보 확인장치} -[relationship:{relationship: 요청한}]-> 날로부터 30일 이내에 반납하여야{name: 날로부터 30일 이내에 반납하여야}
장착{name: 장착} <-[relationship:{relationship: 이후}]- 운행정보 확인장치{name: 운행정보 확인장치} -[relationship:{relationship: 확인}]-> 운행정보{name: 운행정보}
장착{name: 장착} <-[relationship:{relationship: 이후}]- 운행정보 확인장치{name: 운행정보 확인장치} -[relationship:{relationship: 반납하여야}]-> 해지일로부터 30일 이내에{name: 해지일로부터 30일 이내에}
장착{name: 장착} <-[relationship:{relationship: 이후}]- 운행정보 확인장치{name: 운행정보 확인장치} -[relationship:{relationship: 장착하다}]-> 피보험자동차{name: 피보험자동차}
장착{name: 장착} <-[relationship:{relationship: 이후}]- 운행정보 확인장치{name: 운행정보 확인장치} -[relationship:{relationship: 장착하다}]-> 교체(대체)된 차량{name: 교체(대체)된 차량}
장착{name: 장착} <-[relationship:{relationship: 이후}]- 운행정보 확인장치{name: 운행정보 확인장치} -[relationship:{relationship: 아닌 사유}]-> 고장{name: 고장}
장착{name: 장착} <-[relationship:{relationship: 이후}]- 운행정보 확인장치{name: 운행정보 확인장치} -[relationship:{relationship: 반납해야 한다}]-> 보험회사{name: 보험회사}
장착{name: 장착} <-[relationship:{relationship: 이후}]- 운행정보 확인장치{name: 운행정보 확인장치} -[relationship:{relationship: 발생한}]-> 파손{name: 파손}
장착{name: 장착} <-[relationship:{relationship: 이후}]- 운행정보 확인장치{name: 운행정보 확인장치} -[relationship:{relationship: 발생한}]-> 도난{name: 도난}
장착{name: 장착} <-[relationship:{relationship: 이후}]- 운행정보 확인장치{name: 운행정보 확인장치} -[relationship:{relationship: 장착되지 않은 상태에서 운행되었음을 확인}]-> 고장이 아닌 사유{name: 고장이 아닌 사유}
장착{name: 장착} <-[relationship:{relationship: 이후}]- 운행정보 확인장치{name: 운행정보 확인장치} -[relationship:{relationship: 하다}]-> 교체{name: 교체}
장착{name: 장착} <-[relationship:{relationship: 이후}]- 운행정보 확인장치{name: 운행정보 확인장치} -[relationship:{relationship: 그 사실을 알리는 경우}]-> 보험계약자 또는 피보험자{name: 보험계약자 또는 피보험자}
장착{name: 장착} <-[relationship:{relationship: 이후}]- 운행정보 확인장치{name: 운행정보 확인장치} -[relationship:{relationship: 운행정보가 미집계된 기간 일수를 곱하여}]-> 계산하며{name: 계산하며}
장착{name: 장착} <-[relationship:{relationship: 이후}]- 운행정보 확인장치{name: 운행정보 확인장치} -[relationship:{relationship: 정한 절차에 따라}]-> 보험회사에서{name: 보험회사에서}
장착{name: 장착} <-[relationship:{relationship: 이후}]- 운행정보 확인장치{name: 운행정보 확인장치} -[relationship:{relationship: 아닌 사유로}]-> 고장{name: 고장}
장착{name: 장착} <-[relationship:{relationship: 이후}]- 운행정보 확인장치{name: 운행정보 확인장치} -[relationship:{relationship: 교체(대체) 이후 차량}]-> 보험회사{name: 보험회사}
장착{name: 장착} <-[relationship:{relationship: 이후}]- 운행정보 확인장치{name: 운행정보 확인장치} -[relationship:{relationship: 해당하다}]-> 상태{name: 상태}
장착{name: 장착} <-[relationship:{relationship: 이후}]- 운행정보 확인장치{name: 운행정보 확인장치} -[relationship:{relationship: 장치 장착에 관련}]-> 피보험자동차{name: 피보험자동차}
장착{name: 장착} <-[relationship:{relationship: 이후}]- 운행정보 확인장치{name: 운행정보 확인장치} -[relationship:{relationship: 정한 절차에 따라 반납}]-> 보험회사{name: 보험회사}
장착{name: 장착} <-[relationship:{relationship: 이후}]- 운행정보 확인장치{name: 운행정보 확인장치} -[relationship:{relationship: 에 대한 비용이 청구될 수 있습니다}]-> 잔존가치{name: 잔존가치}
장착{name: 장착} <-[relationship:{relationship: 이후}]- 운행정보 확인장치{name: 운행정보 확인장치} -[relationship:{relationship: can be}]-> 파손{name: 파손}
장착{name: 장착} <-[relationship:{relationship: 이후}]- 운행정보 확인장치{name: 운행정보 확인장치} -[relationship:{relationship: 알려야}]-> 상태{name: 상태}

And most of all, the response is '운행정보 확인장치가 장착되기 이전 기간의 주행거리 계산 방법에 대한 정보는 제공되지 않았습니다'. It means I cannot find out any information about it But I already check this information exists in NebularGraphDB. So, I guess KnowledgeGraphRAGRetriever cannot find relationship:{relationship: =}

dosubot[bot] commented 11 months ago

🤖

Thank you for providing detailed information about your issue. It seems like you've done a thorough job of diagnosing the problem. However, to further assist you, I need a bit more information.

Firstly, could you please provide the exact query you're sending to the RetrieverQueryEngine with KnowledgeGraphRAGRetriever? This will help me understand if there's an issue with the query itself.

Secondly, could you please share the code snippet where you're using the display(Markdown(f"<b>{response}</b>")) function? I'd like to see how you're processing the response.

Lastly, it would be helpful if you could share the schema of your NebulaGraphDB, especially the part related to the equation and mathematical symbols. This will help me understand if there's a compatibility issue between the KnowledgeGraphRAGRetriever and NebulaGraphDB.

Once I have this information, I'll be in a better position to help you resolve this issue.

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

JinSeoung-Oh commented 11 months ago

@dosu-bot

First, my query is like:

core_key = core_key = '운행정보 확인 장착 이전 기간의 주행거리는 어떻게 계산 되나요?'

response = query_engine.query( f"Please answer about '{core_key}' more detail in korean. Do not sumary. You have to search all information. Please output information regarding relationships that have an equals sign (=).")

In this, at first time, I think I have to apply NL-to-SQL for core_key. But I skip this part because it seems that it did not affect the results.

Second, I used Ipython.display.display and Ipython.display.Markdown(from IPython.display import Markdown, display) for display(Markdown(f"<b>{response}</b>")) function

Lastly, I don't know it is correct answer

space_name = "llamaindex"
edge_types, rel_prop_names = ["relationship"], [
    "relationship"
] 
tags = ["entity"]

graph_store = NebulaGraphStore(
    space_name=space_name,
    edge_types=edge_types,
    rel_prop_names=rel_prop_names,
    tags=tags,
)

And when I print graph_store.get function it show like this: ['운행정보 확인 장착 이전 기간의 주행거리{name: 운행정보 확인 장착 이전 기간의 주행거리} -[relationship:{relationship: (운행정보 확인장치 장착 시 고지된 주행거리 - 보험거리 시 고지된 주행거리) x (운행정보 확인장치 미장착 기간/주행거리 계산기간)}]-> ={name: =}', '운행정보 확인 장착 이전 기간의 주행거리{name: 운행정보 확인 장착 이전 기간의 주행거리} -[relationship:{relationship: 계산 공식의 한 요소}]-> 운행정보 확인장치 장착 시 고지된 주행거리{name: 운행정보 확인장치 장착 시 고지된 주행거리}', '운행정보 확인 장착 이전 기간의 주행거리{name: 운행정보 확인 장착 이전 기간의 주행거리} -[relationship:{relationship: 계산 공식의 한 요소}]-> 주행거리 계산기간{name: 주행거리 계산기간}', '운행정보 확인 장착 이전 기간의 주행거리{name: 운행정보 확인 장착 이전 기간의 주행거리} -[relationship:{relationship: =}]-> (운행정보 확인장치 장착 시 고지된 주행거리 - 보험거리 시 고지된 주행거리) x (운행정보 확인장치 미장착 기간/주행거리 계산기간){name: (운행정보 확인장치 장착 시 고지된 주행거리 - 보험거리 시 고지된 주행거리) x (운행정보 확인장치 미장착 기간/주행거리 계산기간)}', '운행정보 확인 장착 이전 기간의 주행거리{name: 운행정보 확인 장착 이전 기간의 주행거리} -[relationship:{relationship: 계산 공식의 한 요소}]-> 보험거리 시 고지된 주행거리{name: 보험거리 시 고지된 주행거리}', '운행정보 확인 장착 이전 기간의 주행거리{name: 운행정보 확인 장착 이전 기간의 주행거리} -[relationship:{relationship: 계산 공식의 한 요소}]-> 운행정보 확인장치 미장착 기간{name: 운행정보 확인장치 미장착 기간}']

dosubot[bot] commented 11 months ago

🤖

Based on the information provided, it seems that the issue you're experiencing might not be directly related to LlamaIndex v0.8.56, as there's no specific encoding or decoding methods in LlamaIndex that could potentially interfere with the recognition of mathematical symbols in queries. The text is processed by replacing newline characters with spaces before being passed to the OpenAI API for embedding, which should not interfere with mathematical symbols.

However, it's important to note that the behavior of the OpenAI API in handling mathematical symbols is not specified in the provided context. It's possible that the OpenAI API or the KnowledgeGraphRAGRetriever might not be able to recognize or handle mathematical symbols in the way you're expecting.

As for the NebulaGraphDB, it's also possible that the way it handles mathematical symbols might be causing the issue. If the mathematical symbols are stored as part of the graph's edges or nodes, they should be retrievable. However, if they're part of the properties of the edges or nodes, it might depend on how the NebulaGraphDB handles these properties.

Unfortunately, without more information, it's hard to pinpoint the exact cause of the issue. It would be helpful if you could provide more details about how the mathematical symbols are stored in the NebulaGraphDB and how you're expecting the KnowledgeGraphRAGRetriever to handle them. Additionally, any error messages or logs you're seeing when you try to retrieve these symbols would also be useful.

You can refer to the llama_index/embeddings/openai.py for more details about how LlamaIndex handles text processing.

I hope this helps, and please let me know if you have any more questions or if there's anything else I can assist you with.

Sources

llama_index/embeddings/openai.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

JinSeoung-Oh commented 11 months ago

@dosu-bot I think, I have to check retriever process. Could you give me a start point for this?

Thanks

dosubot[bot] commented 11 months ago

🤖

Based on the context provided, it seems that the inability of the RetrieverQueryEngine with KnowledgeGraphRAGRetriever in LlamaIndex v0.8.56 to search for mathematical symbols like '=', '-', '+', '/', and 'x' when querying the NebulaGraph DB is likely due to the way keywords are extracted and processed in the system.

In the provided code, the _get_keywords method in the KGTableRetriever class uses the predict method of the llm object to extract keywords from the query string. This method likely uses a language model that is trained to extract meaningful words or phrases from a sentence, and it may not be trained to recognize mathematical symbols as keywords.

Here is the relevant code:

def _get_keywords(self, query_str: str) -> List[str]:
    """Extract keywords."""
    response = self._service_context.llm.predict(
        self.query_keyword_extract_template,
        max_keywords=self.max_keywords_per_query,
        question=query_str,
    )
    keywords = extract_keywords_given_response(
        response, start_token="KEYWORDS:", lowercase=False
    )
    return list(keywords)

To check the retriever process, you can enable verbose logging by setting the verbose parameter to True when initializing the KGTableRetriever or KnowledgeGraphRAGRetriever classes. This will print out debug information, including the extracted keywords and the nodes retrieved from the graph.

self._verbose = kwargs.get("verbose", False)

If you want to include mathematical symbols in your queries, you may need to modify the keyword extraction method to treat these symbols as keywords, or use a different method to parse your queries.

The predict method of the llm object in the _get_keywords function of the KGTableRetriever class is used to extract keywords from a given query string. This method uses a language model to generate a response based on the provided query_keyword_extract_template and the question (which is the query string). The max_keywords parameter specifies the maximum number of keywords to extract from the query.

The language model used is not explicitly mentioned in the provided context. However, given that the llm object is part of the service_context, it can be inferred that the language model is likely a part of the service context's configuration and could be any model that is suitable for keyword extraction tasks.

The predict method generates a response that contains the extracted keywords. These keywords are then extracted from the response using the extract_keywords_given_response function, which takes the response and a start token ("KEYWORDS:" in this case) as arguments. The function returns a set of keywords, which is then converted to a list.

Please note that the exact workings of the predict method and the extract_keywords_given_response function, as well as the specifics of the language model used, are not provided in the given context.

You can find more details in the llama_index/indices/knowledge_graph/retrievers.py file in the LlamaIndex repository.

Sources

llama_index/indices/knowledge_graph/retrievers.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

JinSeoung-Oh commented 11 months ago

@dosu-bot I already set verbose = True, when I call KnowledgeGraphRAGRetriever. But I cannot saw debug information, including the extracted keywords and the nodes retrieved from the graph. Because of this reason, I use Ipython display function.

from llama_index.retrievers import KnowledgeGraphRetriever
graph_rag_retriever =KnowledgeGraphRetriever(
storage_context = storage_context,
service_context = service_context,
llm=llm,
verbose=True)

And I use gpt-3.5-turbo-0613 as llm

dosubot[bot] commented 11 months ago

🤖

Based on the information you've provided, it seems that the issue lies in the keyword extraction method of the KeywordTableIndex class in LlamaIndex v0.8.56. This method uses a GPT model to extract keywords from the text, and the GPT model is trained to recognize words and phrases as keywords, not mathematical symbols.

To modify the keyword extraction method to treat these symbols as keywords, you would need to modify the _extract_keywords method. Here is an example of how you could modify the method:

def _extract_keywords(self, text: str) -> Set[str]:
    """Extract keywords from text."""
    response = self._service_context.llm.predict(
        self.keyword_extract_template,
        text=text,
    )
    keywords = extract_keywords_given_response(response, start_token="KEYWORDS:")
    symbols = {'=', '-', '+', '/', 'x'}
    keywords.update(s for s in text if s in symbols)
    return keywords

In this modified version of the method, after the GPT model extracts the keywords, the method checks the text for the presence of the mathematical symbols and adds any that it finds to the set of keywords. This will allow the LlamaIndex to recognize these symbols as keywords.

Regarding the verbose logging, it's possible that the logging level is set too high, and debug information is being filtered out. You can try lowering the logging level to see if that helps. Here's how you can do it:

import logging
logging.basicConfig(level=logging.DEBUG)

This will set the logging level to DEBUG, which is the lowest level and will display all log messages.

Please note that these changes should be tested thoroughly before deploying them in a production environment. If you encounter any issues or need further assistance, feel free to ask.

Sources

llama_index/indices/keyword_table/base.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

JinSeoung-Oh commented 11 months ago

@dosu-bot In your code, I think it extract keyword from query which contain math symbol. I want KnowledgeGraphRetriever return equation which related with given query.

For example, we have ratio of g = ((period of x) + (period of y))/2x(ratio of z)

And I send a query like 'How can I get ratio of g?' Then I want KnowledgeGraphRetriever return 'ratio of g = ((period of x) + (period of y))/2x(ratio of z)'

But in the debugging, RetrieverQueryEngine with KnowledgeGraphRAGRetriever do not search any equation or meth symbol. Even if GraphDB has triplet like this: ['ratio of g{name:ratio of g} - [relationship:{relationship: =}] => ((period of x) + (period of y))/2x(ratio of z){name:((period of x) + (period of y))/2x(ratio of z)}]

So, I just want to know what is problem. Actually, gpt-3.5-turbo-0613 used for math education API for child. Because of this reason, I want to check retriever process but I cannot find out why

dosubot[bot] commented 11 months ago

🤖

Hello @dosubot[bot]! :wave:

I'm Dosu, a friendly bot who's here to lend a helping hand while you're waiting for a human maintainer. Think of me as your digital sidekick, ready to squash bugs, answer your questions, and guide you to becoming a contributor. But remember, I'm just a bot, so sometimes I might not understand everything. Let's get started, shall we? :nerd_face:

Based on the information you've provided, it seems that the issue you're encountering is due to the way the keyword extraction method in LlamaIndex v0.8.56 is designed. This method, _extract_keywords, uses a GPT model to extract keywords from the text. However, this model is trained to recognize words and phrases as keywords, not mathematical symbols.

To address this, you can modify the _extract_keywords method to treat these symbols as keywords. Here's an example of how you can modify the method:

def _extract_keywords(self, text: str) -> Set[str]:
    """Extract keywords from text."""
    response = self._service_context.llm.predict(
        self.keyword_extract_template,
        text=text,
    )
    keywords = extract_keywords_given_response(response, start_token="KEYWORDS:")
    symbols = {'=', '-', '+', '/', 'x'}
    keywords.update(s for s in text if s in symbols)
    return keywords

In this modified version of the method, after the GPT model extracts the keywords, the method checks the text for the presence of the mathematical symbols and adds any that it finds to the set of keywords. This should allow the LlamaIndex to recognize these symbols as keywords.

Regarding the lack of verbose logging, it's possible that the logging level is set too high to display debug information. You can try lowering the logging level to see if that resolves the issue. If it doesn't, please provide more information about your setup, including the version of Python you're using and any error messages you're seeing.

You can find more information about the _extract_keywords method in the KeywordTableIndex class in the LlamaIndex codebase.

I hope this helps! If you have any other questions or need further clarification, please don't hesitate to ask.

Sources

llama_index/indices/keyword_table/base.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

JinSeoung-Oh commented 11 months ago

@dosu-bot Hum.. I think, knowledgeGrahRAGRetriever find knowledge based on Entity. I cannot understand why meth symbol have to extract as keyword. Because in this example, ['ratio of g{name:ratio of g} - [relationship:{relationship: =}] => ((period of x) + (period of y))/2x(ratio of z){name:((period of x) + (period of y))/2x(ratio of z)}]

'= ((period of x) + (period of y))/2x(ratio of z)' is linked 'ratio of g' And In my example query ('How can I get ratio of g?') is contain 'get ratio of g' OpenAI extract keywords from this query, and maybe 'get ratio of g' contained in keywords extracted by OpenAI

So, I think retriever have to search '=' and '((period of x) + (period of y))/2x(ratio of z)' But in debugging, It is not. It never search '=' and '((period of x) + (period of y))/2x(ratio of z)'

dosubot[bot] commented 11 months ago

🤖

Based on the information provided, it seems that the LlamaIndex v0.8.56 is not recognizing mathematical symbols like '=', '-', '+', '/', and 'x' when used with NebulaGraph DB and the RetrieverQueryEngine with KnowledgeGraphRAGRetriever because the keyword extraction method in the LlamaIndex is not designed to treat these symbols as keywords.

The keyword extraction method is defined in the _extract_keywords method of the KeywordTableIndex class. This method uses a GPT model to extract keywords from the text. The GPT model is trained to recognize words and phrases as keywords, not mathematical symbols.

To modify the keyword extraction method to treat these symbols as keywords, you would need to modify the _extract_keywords method. You could add a step to this method that checks for the presence of these symbols in the text and adds them to the set of keywords. Here is an example of how you could modify the method:

def _extract_keywords(self, text: str) -> Set[str]:
    """Extract keywords from text."""
    response = self._service_context.llm.predict(
        self.keyword_extract_template,
        text=text,
    )
    keywords = extract_keywords_given_response(response, start_token="KEYWORDS:")
    symbols = {'=', '-', '+', '/', 'x'}
    keywords.update(s for s in text if s in symbols)
    return keywords

You can find more details about the KeywordTableIndex class and the _extract_keywords method in the llama_index/indices/keyword_table/base.py file.

As for the KnowledgeGraphRAGRetriever not providing verbose logging, it's possible that the logging level is set too high to display debug information. You could try lowering the logging level to see if that helps. However, without more information, it's hard to say why you're not seeing any debug information.

I hope this helps! If you have any other questions or need further clarification, please let me know.

Sources

llama_index/indices/keyword_table/base.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

JinSeoung-Oh commented 11 months ago

@dosu-bot Hum.. actually, it is not right answer about my question. Of course, you are bot so I understand it is possible your answer is not right In previous question, I want to check how llama-index find information in knowledge graph I think, at first, llama-index extract keyword for defining entities. Based on this entities, llama-index search [Entity-relationship-Entity]

Actually, in my database, math symbol is just relationship, just bridge. I don't understand why keywords are needed for relationships. Don't I just need keywords for entities?

dosubot[bot] commented 11 months ago

🤖

Based on your description, it seems that the LlamaIndex v0.8.56 isn't recognizing mathematical symbols as relationships because the _llm_extract_triplets method, which is responsible for extracting triplets from the text, is not designed to treat mathematical symbols as keywords or relationships. It uses the self._service_context.llm.predict method with the self.kg_triple_extract_template template to extract triplets, and this template might not be designed to recognize mathematical symbols as relationships.

To modify the keyword extraction method to treat these symbols as keywords, you can create a custom kg_triplet_extract_fn function that treats mathematical symbols as relationships and pass it to the KnowledgeGraphIndex constructor. This function will be used instead of the _llm_extract_triplets method to extract triplets from the text.

Here is an example of how you can do this:

def custom_extract_triplets(text: str) -> List[Tuple[str, str, str]]:
    # Your custom implementation here
    # This function should return a list of triplets
    # Each triplet is a tuple of three strings: (subject, relationship, object)
    pass

kg_index = KnowledgeGraphIndex(
    # other arguments here
    kg_triplet_extract_fn=custom_extract_triplets,
)

In your custom_extract_triplets function, you can use any method you want to extract triplets from the text. For example, you can use regular expressions to find mathematical symbols and treat them as relationships. You can also use the GPT-3.5-turbo-0613 model to extract keywords from the text, and then post-process the results to treat mathematical symbols as relationships.

As for why keywords are needed for relationships rather than entities, it's because relationships define the connections between entities in a knowledge graph. Without relationships, the knowledge graph would just be a collection of disconnected entities. Keywords are used to identify these relationships and connect the entities in a meaningful way.

For more details, you can refer to the KnowledgeGraphIndex class in the LlamaIndex repository.

I hope this helps! If you have any other questions, feel free to ask.

Sources

llama_index/indices/knowledge_graph/base.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

JinSeoung-Oh commented 11 months ago

@dosu-bot No.. hum.. actually, triplet with math symbol already stored in my GraphDB. So, I don't understand why we extract keywords for relationships from query are needed. If my understand is right, we need to extract keywords for just entity from query not keywords for relationship.

dosubot[bot] commented 11 months ago

🤖

Based on your description, it seems that the issue lies in the _llm_extract_triplets method of the KnowledgeGraphIndex class. This method is responsible for extracting triplets from the text and it uses the self._service_context.llm.predict method to do so. However, this method may not recognize mathematical symbols like '=', '-', '+', '/', and 'x' as relationships.

To address this, you can override the _llm_extract_triplets method in the KnowledgeGraphIndex class to treat these symbols as keywords. Here's an example of how you can do it:

def _llm_extract_triplets(self, text: str) -> List[Tuple[str, str, str]]:
    """Extract keywords from text."""
    response = self._service_context.llm.predict(
        self.kg_triple_extract_template,
        text=text,
    )
    # Add code here to treat mathematical symbols as keywords
    response = response.replace('=', ' equals ').replace('-', ' minus ').replace('+', ' plus ').replace('/', ' divided by ').replace('x', ' times ')
    return self._parse_triplet_response(
        response, max_length=self._max_object_length
    )

In this example, we replace the mathematical symbols with their corresponding word representations before parsing the response. This way, the symbols are treated as keywords and are recognized as relationships in the query. Please note that this is a simple example and may not cover all cases. You might need to adjust the code according to your specific needs.

As for your question about why we need to extract keywords for relationships from the query, it's because the relationships in a graph database are just as important as the entities. The relationships define how the entities are connected and interact with each other. By extracting keywords for relationships, we can better understand the structure and semantics of the graph.

For more information, you can refer to the source code of the KnowledgeGraphIndex class in the LlamaIndex repository.

I hope this helps! If you have any other questions, feel free to ask.

Sources

llama_index/indices/knowledge_graph/base.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

JinSeoung-Oh commented 11 months ago

@dosu-bot I think you cannot understand what I ask. If you're answer is right, every queries must include relationships stored in the database. But in real cases, that is impossible. And if it is right, llama-index is not right tool for building RAG

In my use case, It must answer a mathematical formula related to the entities extracted from a given query. Even if the given query doesn't contain the formula or math symbol I need to find. And as far as I know graph rag should be able to do that If extracted entities has relationship with a mathematical formula

If what you say is correct, we cannot richly retrieve all the knowledge associated with a specific entity through a query. Only 1 or 2 relationships contained in my query are retrieved

dosubot[bot] commented 11 months ago

🤖

Based on the information you've provided, it seems that the issue lies in the way the KnowledgeGraphRAGRetriever is extracting relationships from the text. The KnowledgeGraphRAGRetriever uses a language model to extract triplets from the text, and it may not recognize mathematical symbols as relationships because it's designed to extract natural language relationships.

To modify the keyword extraction method to treat these symbols as relationships, you can provide a custom kg_triplet_extract_fn function when initializing the KnowledgeGraphIndex. This function should take a text string as input and return a list of triplets, where each triplet is a tuple of (subject, relationship, object). In this function, you can implement your own logic to recognize mathematical symbols as relationships.

Here is an example of how you might implement this:

def custom_extract_triplets(text: str) -> List[Tuple[str, str, str]]:
    # Implement your own logic here to extract triplets from the text,
    # treating mathematical symbols as relationships.
    # This is just a placeholder implementation.
    triplets = []
    # Split the text into sentences or phrases, then for each sentence or phrase:
    # 1. Identify the subject, relationship (mathematical symbol), and object.
    # 2. Add the triplet to the triplets list.
    return triplets

kg_index = KnowledgeGraphIndex(
    # Other parameters...
    kg_triplet_extract_fn=custom_extract_triplets,
)

In the custom_extract_triplets function, you would need to implement the logic to split the text into sentences or phrases, identify the subject, relationship (mathematical symbol), and object in each sentence or phrase, and add the triplet to the triplets list. This could involve using regular expressions or other text processing techniques to identify the mathematical symbols and the entities they relate.

Regarding your question about retrieving all the knowledge associated with a specific entity through a query, not just the 1 or 2 relationships contained in your query, you might need to modify the query method of the KnowledgeGraphRAGRetriever to retrieve all the nodes connected to the entity in the graph, not just the nodes that match the extracted keywords. This could involve using the expand method of the NebulaGraphDB to retrieve all the nodes connected to the entity, and then using the retrieve method to retrieve the information associated with these nodes.

I hope this helps! If you have any further questions, please don't hesitate to ask.

Sources

llama_index/indices/knowledge_graph/base.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

JinSeoung-Oh commented 11 months ago

@dosu-bot Your answer is not really helpful. I think, I have to change my question. Line 642 in llama_index/indices/knowledge_graph/retrievers.py, we can find _get_knowledge_sequence function In this function, llama-index find knowledge sequence with just entities using self._graph_store.get_rel_map

So, my new question is if entity extracted from given query linked with other entity with math symbol(it is relationship with these entity) then _get_knowledge_sequence cannot recognize this because of math symbol?

dosubot[bot] commented 11 months ago

🤖

The _get_knowledge_sequence function in the llama_index/indices/knowledge_graph/retrievers.py file of LlamaIndex v0.8.56 is not recognizing mathematical symbols as relationships because it is not designed to do so. The function is designed to retrieve relationships from a graph database based on the entities provided. It does this by calling the get_rel_map method on the _graph_store object, which is expected to return a map of relationships involving the provided entities.

The get_rel_map method is not shown in the provided context, but it is likely that this method is not designed to recognize mathematical symbols as relationships. In a graph database like NebulaGraph DB, relationships are typically represented as edges between nodes, and these edges are usually labeled with text rather than mathematical symbols.

To modify the _get_knowledge_sequence function to recognize mathematical symbols as relationships, you would need to modify the get_rel_map method (or whatever method is used to retrieve relationships from the graph database) to recognize these symbols. This would likely involve changing the query that is sent to the database to include these symbols.

Here is a hypothetical example of how you might modify the get_rel_map method to recognize mathematical symbols as relationships:

def get_rel_map(self, entities, depth, limit):
    # This is a hypothetical query that includes mathematical symbols as relationships.
    # The actual query would depend on the specifics of your graph database and data model.
    query = f"MATCH (n)-[r:*>0..{depth}]-(m) WHERE n.name IN {entities} RETURN r"
    result = self._db.run(query)
    rel_map = {}
    for record in result:
        for rel in record["r"]:
            # This assumes that the relationship type is stored in the 'type' property of the edge.
            rel_type = rel.type
            # This assumes that the nodes have a 'name' property.
            node1_name = rel.nodes[0]["name"]
            node2_name = rel.nodes[1]["name"]
            if node1_name not in rel_map:
                rel_map[node1_name] = []
            rel_map[node1_name].append((rel_type, node2_name))
    return rel_map

This is a hypothetical example and may not work in your specific situation. The actual implementation would depend on the specifics of your graph database and data model.

As for the get_rel_map method in the SimpleGraphStoreData class, it is implemented as follows:

def get_rel_map(
    self, subjs: Optional[List[str]] = None, depth: int = 2, limit: int = 30
) -> Dict[str, List[List[str]]]:
    """Get subjects' rel map in max depth."""
    if subjs is None:
        subjs = list(self.graph_dict.keys())
    rel_map = {}
    for subj in subjs:
        rel_map[subj] = self._get_rel_map(subj, depth=depth, limit=limit)
    # TBD, truncate the rel_map in a spread way, now just truncate based
    # on iteration order
    rel_count = 0
    return_map = {}
    for subj in rel_map:
        if rel_count + len(rel_map[subj]) > limit:
            return_map[subj] = rel_map[subj][: limit - rel_count]
            break
        else:
            return_map[subj] = rel_map[subj]
            rel_count += len(rel_map[subj])
    return return_map

This method is used to get the relationship map of the subjects up to a maximum depth. If no subjects are provided, it uses all the keys in the graph_dict. It then iterates over the subjects and calls the _get_rel_map method for each subject. The results are stored in the rel_map dictionary.

The method then creates a return_map dictionary. It iterates over the rel_map and adds the relationships to the return_map until the limit is reached. If adding the next set of relationships would exceed the limit, it truncates the list of relationships and breaks the loop. The return_map is then returned.

You can find more details in the llama_index/indices/knowledge_graph/retrievers.py and llama_index/graph_stores/simple.py files.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

JinSeoung-Oh commented 11 months ago

@dosu-bot Okay, it is really helpful. Thanks!

dosubot[bot] commented 11 months ago

🤖

Based on the information provided, it seems that the issue you're encountering is due to the current implementation of the _llm_extract_triplets method in the KnowledgeGraphIndex class. This method uses the self._service_context.llm.predict function to extract triplets from the text, and it appears that this function does not treat mathematical symbols as keywords.

To address this, you could potentially modify the _llm_extract_triplets method to include these symbols in the extraction process. Here's a general idea of how you might modify the method:

def _llm_extract_triplets(self, text: str) -> List[Tuple[str, str, str]]:
    """Extract keywords from text."""
    # Add code here to treat mathematical symbols as keywords
    text = text.replace('=', ' EQUALS ')
    text = text.replace('-', ' MINUS ')
    text = text.replace('+', ' PLUS ')
    text = text.replace('/', ' DIVIDE ')
    text = text.replace('x', ' TIMES ')
    response = self._service_context.llm.predict(
        self.kg_triple_extract_template,
        text=text,
    )
    return self._parse_triplet_response(
        response, max_length=self._max_object_length
    )

This code replaces the mathematical symbols with their word equivalents before passing the text to the self._service_context.llm.predict function. This way, the function should treat these symbols as keywords. Please note that this is a general idea and might need to be adjusted based on the actual implementation of the self._service_context.llm.predict function.

Unfortunately, without the exact implementation details of the self._service_context.llm.predict function, it's hard to provide a more concrete solution. I wasn't able to find this information within the LlamaIndex repository. If you have more details about this function or if there's a specific part of the codebase you'd like me to look into, please let me know. I'm here to help! Dosu

Sources

llama_index/indices/knowledge_graph/base.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.