Closed 912100012 closed 1 week ago
To split Chinese content in a PDF using the SemanticSplitterNodeParser
to make the retrieve
function more accurate, you need to ensure that the sentence_splitter
is capable of handling Chinese text. The default split_by_sentence_tokenizer
may not be suitable for Chinese, so you should provide a custom sentence splitter that can handle Chinese sentence segmentation.
Here is an example of how you can implement a custom sentence splitter for Chinese text and use it with SemanticSplitterNodeParser
:
from llama_index.core.node_parser.text.semantic_splitter import SemanticSplitterNodeParser
from llama_index.embeddings.openai import OpenAIEmbedding
import jieba
def chinese_sentence_splitter(text: str) -> List[str]:
# Use jieba to split Chinese text into sentences
sentences = jieba.lcut(text, cut_all=False)
return sentences
# Create an instance of the SemanticSplitterNodeParser with the custom sentence splitter
parser = SemanticSplitterNodeParser.from_defaults(
embed_model=OpenAIEmbedding(),
sentence_splitter=chinese_sentence_splitter
)
# Now you can use the parser to split your Chinese PDF content
# Example usage:
# document = Document(text="你的中文内容在这里")
# nodes = parser.build_semantic_nodes_from_documents([document])
In this example, jieba
is used to split Chinese text into sentences. You can replace jieba
with any other Chinese text segmentation tool if needed. The custom chinese_sentence_splitter
function is then passed to the SemanticSplitterNodeParser
to handle Chinese content accurately.
Alternatively, you can use the DashScopeJsonNodeParser
which is designed to handle Chinese text and provides more advanced options for splitting text:
from llama_index.node_parser.dashscope.base import DashScopeJsonNodeParser
import os
# Ensure the DashScope API key is set in the environment variables
os.environ["DASHSCOPE_API_KEY"] = "your_dashscope_api_key"
# Create an instance of the DashScopeJsonNodeParser
parser = DashScopeJsonNodeParser()
# Now you can use the parser to split your Chinese PDF content
# Example usage:
# document = Document(text="你的中文内容在这里")
# nodes = parser.build_semantic_nodes_from_documents([document])
The DashScopeJsonNodeParser
uses a predefined set of separators suitable for Chinese text and can handle various document formats, making it a robust choice for splitting Chinese content in PDFs [1][2][3].
To continue talking to Dosu, mention @dosu.
just as you said ,it can work by using jieba tool,and a new question comes,Is your approach semantically segmented?Since I didn't see you go through the embedding process, I have this question.Therefore, how to determine whether the semantics are similar when not using the embedding?
Yes, the approach is semantically segmented. The SemanticSplitterNodeParser
class in the LlamaIndex framework segments text into nodes by grouping sentences based on their semantic similarity. This process relies on an embedding model to calculate the similarity between sentence groups and splits the text into nodes where the semantic similarity falls below a certain threshold.
To determine whether the semantics are similar without using embeddings, the provided code does not offer an alternative method. The evaluation and segmentation processes rely on embedding similarity to assess semantic similarity.
Here is an example of how the SemanticSplitterNodeParser
works:
from llama_index.core.node_parser.text.semantic_splitter import SemanticSplitterNodeParser
from llama_index.embeddings.openai import OpenAIEmbedding
import jieba
def chinese_sentence_splitter(text: str) -> List[str]:
# Use jieba to split Chinese text into sentences
sentences = jieba.lcut(text, cut_all=False)
return sentences
# Create an instance of the SemanticSplitterNodeParser with the custom sentence splitter
parser = SemanticSplitterNodeParser.from_defaults(
embed_model=OpenAIEmbedding(),
sentence_splitter=chinese_sentence_splitter
)
# Example usage:
document = Document(text="你的中文内容在这里")
nodes = parser.build_semantic_nodes_from_documents([document])
In summary, while the SemanticSplitterNodeParser
relies on embeddings for semantic similarity evaluation, it provides a detailed approach to segmenting text semantically. There is no alternative method provided in the code to determine semantic similarity without using embeddings [1][2][3].
To continue talking to Dosu, mention @dosu.
牛逼,是我刚才没想明白。谢谢
我使用了你告诉我的方法,使用jieba进行分割,但是效果并不好,会不会是因为jieba专注于分词,而我想要的是对长篇的文章(也就是大量的句子)进行分割@dosu
你说得对,jieba
主要用于分词,而不是句子分割。对于长篇文章的句子分割,可以尝试使用 snownlp
或 pkuseg
这些更适合中文句子分割的工具。
以下是使用 snownlp
进行句子分割的示例:
from llama_index.core.node_parser.text.semantic_splitter import SemanticSplitterNodeParser
from llama_index.embeddings.openai import OpenAIEmbedding
from snownlp import SnowNLP
def chinese_sentence_splitter(text: str) -> List[str]:
# Use SnowNLP to split Chinese text into sentences
s = SnowNLP(text)
sentences = s.sentences
return sentences
# Create an instance of the SemanticSplitterNodeParser with the custom sentence splitter
parser = SemanticSplitterNodeParser.from_defaults(
embed_model=OpenAIEmbedding(),
sentence_splitter=chinese_sentence_splitter
)
# Example usage:
document = Document(text="你的长篇中文内容在这里")
nodes = parser.build_semantic_nodes_from_documents([document])
SnowNLP
可以更好地处理中文句子分割,结合 SemanticSplitterNodeParser
的嵌入模型,可以更准确地进行语义分割。你可以根据需要调整 buffer_size
和 breakpoint_percentile_threshold
参数,以优化分割效果:
parser = SemanticSplitterNodeParser.from_defaults(
embed_model=OpenAIEmbedding(),
sentence_splitter=chinese_sentence_splitter,
buffer_size=2, # 根据需要调整
breakpoint_percentile_threshold=90 # 根据需要调整
)
通过这种方式,你可以更好地处理长篇中文文本的语义分割。希望这能帮助你解决问题!
To continue talking to Dosu, mention @dosu.
@dosu请你用心给我解释SemanticSplitterNodeParser类中这个函数以下这几个参数(breakpoint_percentile_threshold、buffer_size、original_text_metadata_key、include_prev_next_rel、callback_manager、id_func)的含义,要有对应的文本示例通过不同参数进行比较 def from_defaults( cls, embed_model: Optional[BaseEmbedding] = None, breakpoint_percentile_threshold: Optional[int] = 95, buffer_size: Optional[int] = 1, sentence_splitter: Optional[Callable[[str], List[str]]] = None, original_text_metadata_key: str = DEFAULT_OG_TEXT_METADATA_KEY, include_metadata: bool = True, include_prev_next_rel: bool = True, callback_manager: Optional[CallbackManager] = None, id_func: Optional[Callable[[int, Document], str]] = None, ) -> "SemanticSplitterNodeParser":
SemanticSplitterNodeParser
类中的 from_defaults
函数参数的含义如下:
breakpoint_percentile_threshold
:
parser = SemanticSplitterNodeParser.from_defaults(breakpoint_percentile_threshold=90)
buffer_size
:
parser = SemanticSplitterNodeParser.from_defaults(buffer_size=3)
original_text_metadata_key
:
parser = SemanticSplitterNodeParser.from_defaults(original_text_metadata_key="my_original_text")
include_prev_next_rel
:
True
时,节点将包含前后关系信息。parser = SemanticSplitterNodeParser.from_defaults(include_prev_next_rel=False)
callback_manager
:
CallbackManager
。callback_manager = CallbackManager([])
parser = SemanticSplitterNodeParser.from_defaults(callback_manager=callback_manager)
id_func
:
default_id_func
。def custom_id_func(index, document):
return f"custom_id_{index}"
parser = SemanticSplitterNodeParser.from_defaults(id_func=custom_id_func)
这些参数可以通过不同的组合来调整 SemanticSplitterNodeParser
的行为,以适应不同的需求[1][2].
To continue talking to Dosu, mention @dosu.
请你解释一下这个include_prev_next_rel参数,给出我当他为True和为False的两种不同情况的中文文本示例@dosu
include_prev_next_rel
参数用于决定是否在节点之间建立前后关系。
当 include_prev_next_rel
为 True
时,节点之间会建立前后关系。例如:
{
"nodes": [
{
"id": "1",
"content": "这是第一个节点。",
"relationships": {
"NEXT": {
"id": "2",
"content": "这是第二个节点。"
}
}
},
{
"id": "2",
"content": "这是第二个节点。",
"relationships": {
"PREVIOUS": {
"id": "1",
"content": "这是第一个节点。"
},
"NEXT": {
"id": "3",
"content": "这是第三个节点。"
}
}
},
{
"id": "3",
"content": "这是第三个节点。",
"relationships": {
"PREVIOUS": {
"id": "2",
"content": "这是第二个节点。"
}
}
}
]
}
当 include_prev_next_rel
为 False
时,节点之间不会建立前后关系。例如:
{
"nodes": [
{
"id": "1",
"content": "这是第一个节点。",
"relationships": {}
},
{
"id": "2",
"content": "这是第二个节点。",
"relationships": {}
},
{
"id": "3",
"content": "这是第三个节点。",
"relationships": {}
}
]
}
To continue talking to Dosu, mention @dosu.
breakpoint_percentile_threshold这个参数我明白了什么意思,但是我想进行一下可视化,比如我设置为90的时候,是发生在哪里的断点,有没有方法,能通过视图或者输出的方式,显示node解析之后所有数据的断点@dosu
要可视化 breakpoint_percentile_threshold
参数设置为 90 时的断点位置,并显示解析后的所有数据断点,可以修改 SemanticSplitterNodeParser
类中的 _split_text
方法来打印或记录断点。此外,还可以创建一个方法来显示解析节点中的所有断点。
_split_text
方法以打印断点def _split_text(self, text: str) -> List[str]:
"""
_Split incoming text and return chunks with overlap size.
Has a preference for complete sentences, phrases, and minimal overlap.
"""
# Splitting the essay on '.', '?', and '!'
single_sentences_list = re.split(r"(?<=[.?!])\s+", text)
sentences = [
{"sentence": x, "index": i} for i, x in enumerate(single_sentences_list)
]
combined_sentences = combine_sentences(sentences, self.buffer_size)
# compute embeddings
embeddings = self.embed_model.get_text_embedding_batch(
[x["combined_sentence"] for x in combined_sentences]
)
# assign embeddings to the sentences
for i, embedding in enumerate(embeddings):
combined_sentences[i]["embedding"] = embedding
# calculate cosine distance between adjacent sentences
distances = calculate_cosine_distances(combined_sentences)
for i, distance in enumerate(distances):
combined_sentences[i]["dist_to_next"] = distance
# get indices above threshold
indices_above_thresh = get_indices_above_threshold(
distances, self.breakpoint_percentile_threshold
)
# Print the breakpoints
print(f"Breakpoints at indices: {indices_above_thresh}")
# make chunks
return make_chunks(combined_sentences, indices_above_thresh)
def display_breakpoints(self, documents: List[Document], show_progress: bool = False):
for doc in documents:
text = doc.text
text_splits = self.sentence_splitter(text)
sentences = self._build_sentence_groups(text_splits)
combined_sentence_embeddings = self.embed_model.get_text_embedding_batch(
[s["combined_sentence"] for s in sentences],
show_progress=show_progress,
)
for i, embedding in enumerate(combined_sentence_embeddings):
sentences[i]["combined_sentence_embedding"] = embedding
distances = calculate_cosine_distances(sentences)
breakpoint_distance_threshold = np.percentile(
distances, self.breakpoint_percentile_threshold
)
indices_above_threshold = [
i for i, x in enumerate(distances) if x > breakpoint_distance_threshold
]
print(f"Document ID: {doc.id}")
print(f"Breakpoints at indices: {indices_above_threshold}")
通过调用 display_breakpoints
方法并传入文档列表,你可以看到所有断点的位置 [1][2][3]。
To continue talking to Dosu, mention @dosu.
how can i set the top k param to make the retrieve to give the more nodes result?@dosu
To set the top_k
parameter to make the retrieve
function return more nodes, you need to set the similarity_top_k
attribute of the VectorIndexRetriever
instance. This can be done by using the similarity_top_k
setter method.
Here is an example of how you can set the similarity_top_k
parameter:
# Assuming you have an instance of VectorIndexRetriever called vector_index_retriever
vector_index_retriever.similarity_top_k = 10 # Set top_k to 10 to retrieve more nodes
This will configure the VectorIndexRetriever
to return the top 10 most similar nodes instead of the default value. The similarity_top_k
parameter is used internally by the VectorIndexRetriever
to build the query and determine the number of top results to return.
Alternatively, you can set the SIMILARITY_TOP_K
parameter in a configuration file if your setup supports it:
SIMILARITY_TOP_K=5
This approach can be useful for managing configurations in a more centralized and environment-specific manner [1][2][3][4].
To continue talking to Dosu, mention @dosu.
Please combine my code,tell me where to add the top_k param?
storage_context = StorageContext.from_defaults(persist_dir="../vecdb/lmsg")
index = load_index_from_storage(storage_context, embed_model=MySetting.my_embedding_mode())
retriever = index.as_retriever(retriever_mode=MySetting.my_llm_mode()) nodes = retriever.retrieve("我是一个什么样的人") @dosu
To add the top_k
parameter to your code and make the retrieve
function return more nodes, you need to set the similarity_top_k
parameter when creating the retriever. Here is how you can modify your code:
# rebuild storage context
storage_context = StorageContext.from_defaults(persist_dir="../vecdb/lmsg")
# load index
index = load_index_from_storage(storage_context, embed_model=MySetting.my_embedding_mode())
# Set the top_k parameter
retriever = index.as_retriever(retriever_mode=MySetting.my_llm_mode(), similarity_top_k=10)
nodes = retriever.retrieve("我是一个什么样的人")
In this example, similarity_top_k=10
is used to specify that the top 10 most similar nodes should be returned. Adjust the value of similarity_top_k
as needed [1][2].
To continue talking to Dosu, mention @dosu.
Please check it ,why still do not make?My code has add the similarity_top_k=10 as follow,but the result is still the top2 my code:
from typing import List
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, StorageContext, load_index_from_storage from llama_index.core.node_parser.text.semantic_splitter import SemanticSplitterNodeParser from llama_index.core.storage.index_store import SimpleIndexStore from llama_index.embeddings.openai import OpenAIEmbedding import jieba from snownlp import SnowNLP
from pro_settings.my_settings import MySetting
MySetting.open_log()
storage_context = StorageContext.from_defaults(persist_dir="../vecdb/lmsg")
index = load_index_from_storage(storage_context, embed_model=MySetting.my_embedding_mode())
retriever = index.as_retriever(retriever_mode=MySetting.my_llm_mode(), similarity_top_k=6) nodes = retriever.retrieve("我是一个什么样的人")
result: DEBUG:llama_index.core.indices.utils:> Top 2 nodes:
[Node 63e7d029-f908-41f8-ba45-2640dfcae13d] [Similarity score: 0.322666] ” 妖妖忽然大惊小怪地叫起来:“陈辉,你是诗人呢!” 我说:“我是诗人?不错,当然我是诗人” “你怎么啦?我说真的呢!你很可以做一个不坏的诗人你有真正的诗人气质!” “你... [Node b0373054-7be0-4e33-9867-457a79a63ffd] [Similarity score: 0.31907] “我与那个杨素瑶的相识还要上溯到十二年以前”,老陈从嘴上取下烟斗,在一团朦胧的烟雾里看着我 这时候我们正一同坐在公园的长椅上:“我可以把这段经历完全告诉你,因为你是我唯一的朋友,除... Top 2 nodes: [Node 63e7d029-f908-41f8-ba45-2640dfcae13d] [Similarity score: 0.322666] ” 妖妖忽然大惊小怪地叫起来:“陈辉,你是诗人呢!” 我说:“我是诗人?不错,当然我是诗人” “你怎么啦?我说真的呢!你很可以做一个不坏的诗人你有真正的诗人气质!” “你... [Node b0373054-7be0-4e33-9867-457a79a63ffd] [Similarity score: 0.31907] “我与那个杨素瑶的相识还要上溯到十二年以前”,老陈从嘴上取下烟斗,在一团朦胧的烟雾里看着我 这时候我们正一同坐在公园的长椅上:“我可以把这段经历完全告诉你,因为你是我唯一的朋友,除...
@dosu
The issue might be due to the similarity_top_k
parameter being overridden in the VectaraRetriever
class. The VectaraRetriever
class has a default value for similarity_top_k
set to 5, which might be overriding the value you set in the as_retriever
method.
To ensure that similarity_top_k
is set correctly, you should check the initialization of the VectaraRetriever
class and make sure it does not contain a conflicting similarity_top_k
value. Additionally, you can use the similarity_top_k
property to set and get the value to ensure it is correctly applied.
Here is the relevant part of the code:
class VectaraRetriever(BaseRetriever):
def __init__(
self,
index: VectaraIndex,
similarity_top_k: int = 5,
vectara_query_mode: ManagedIndexQueryMode = ManagedIndexQueryMode.DEFAULT,
lambda_val: float = 0.005,
n_sentences_before: int = 2,
n_sentences_after: int = 2,
filter: str = "",
mmr_k: int = 50,
mmr_diversity_bias: float = 0.3,
summary_enabled: bool = False,
summary_response_lang: str = "eng",
summary_num_results: int = 7,
summary_prompt_name: str = "vectara-experimental-summary-ext-2023-10-23-small",
callback_manager: Optional[CallbackManager] = None,
**kwargs: Any,
) -> None:
"""Initialize params."""
self._index = index
self._similarity_top_k = similarity_top_k
self._lambda_val = lambda_val
self._n_sentences_before = n_sentences_before
self._n_sentences_after = n_sentences_after
self._filter = filter
if vectara_query_mode == ManagedIndexQueryMode.MMR:
self._mmr = True
self._mmr_k = mmr_k
self._mmr_diversity_bias = mmr_diversity_bias
else:
self._mmr = False
if summary_enabled:
self._summary_enabled = True
self._summary_response_lang = summary_response_lang
self._summary_num_results = summary_num_results
self._summary_prompt_name = summary_prompt_name
else:
self._summary_enabled = False
super().__init__(callback_manager)
@property
def similarity_top_k(self) -> int:
"""Return similarity top k."""
return self._similarity_top_k
@similarity_top_k.setter
def similarity_top_k(self, similarity_top_k: int) -> None:
"""Set similarity top k."""
self._similarity_top_k = similarity_top_k
Ensure that when you create an instance of VectaraRetriever
, you pass the correct similarity_top_k
value and that it is not being overridden by any default values or other parameters [1].
Additionally, there are other parameters or settings in the VectorIndexRetriever
class that could limit the number of nodes returned, even if similarity_top_k
is set to a higher value. Specifically:
mmr_k
: If the vectara_query_mode
is set to ManagedIndexQueryMode.MMR
, the number of results fetched for MMR (Maximal Marginal Relevance) is determined by mmr_k
, which defaults to 50. This could limit the number of nodes returned if mmr_k
is set to a value lower than similarity_top_k
.
filter
: The filter
parameter can be used to apply metadata filters. If specified, it could limit the number of nodes returned based on the filtering criteria.
summary_num_results
: If summary_enabled
is set to True
, the number of results used for summary generation is determined by summary_num_results
, which defaults to 7. This could also affect the number of nodes returned [1].
Check these parameters to ensure they are not limiting the number of nodes returned.
To continue talking to Dosu, mention @dosu.
when i use the Neo4jPropertyGraphStore,how can i build the PropertyGraphIndex and from the nodes that i already parsed,instead of documents.@dosu
To build the PropertyGraphIndex
from the nodes that you have already parsed, instead of from documents, you can directly pass the list of nodes to the PropertyGraphIndex
constructor. Here is the relevant code snippet:
from llama_index.core import PropertyGraphIndex
from llama_index.core.graph_stores.simple_labelled import SimplePropertyGraphStore
from llama_index.core.indices.property_graph.transformations import SimpleLLMPathExtractor, ImplicitPathExtractor
from llama_index.core.llms import LLM
# Assuming you have a list of parsed nodes
nodes = [...] # Your list of parsed nodes
# Initialize the PropertyGraphIndex using the parsed nodes
index = PropertyGraphIndex(
nodes=nodes,
property_graph_store=SimplePropertyGraphStore(),
kg_extractors=[SimpleLLMPathExtractor(llm=LLM()), ImplicitPathExtractor()],
show_progress=True,
)
This code initializes the PropertyGraphIndex
using the nodes
you have already parsed, along with the SimplePropertyGraphStore
and any knowledge graph extractors you have defined. Additionally, you can insert nodes into an existing PropertyGraphIndex
using the insert_nodes
method:
index.insert_nodes(nodes)
The PropertyGraphIndex
class supports various parameters, including llm
, kg_extractors
, property_graph_store
, vector_store
, use_async
, embed_model
, embed_kg_nodes
, callback_manager
, transformations
, storage_context
, and show_progress
, which you can customize as needed [1][2][3].
To continue talking to Dosu, mention @dosu.
@dosu i use your code,and get the response as follow,but when i use the order "MATCH (n:Chunk) RETURN n LIMIT 25 "in neo4j browser,i got the entity is not what i want ,they're all weird labels, like chunk、Entity、Node : Request body: {'model': 'qwen-turbo', 'parameters': {'max_tokens': 1000, 'temperature': 0.1, 'seed': 1234, 'result_format': 'message'}, 'input': {'messages': [{'role': 'user', 'content': 'Give the following text, extract the knowledge graph according to the provided schema. Try to limit to the output 10 extracted paths.s\n-------\n可是我再也没有跟着乱,可以说是很遵守课堂纪律我觉得同学们都很卑鄙,软的欺侮,硬的怕至于我吗,我是个男子汉大丈夫,我不干那些卑鄙的勾当 下了课,我看见刘老师到教导处去了我感到很好奇,就走到教导处门口去偷听我听见孙主任在问: “小刘,这节课怎么样?” “不行,主任还是乱哄哄的,根本没法上” “那你就不上,先把纪律整顿好再说!” “不行啊,我怎么说他们也不听!” “你揪两个到前面去!” “我一到跟前他们就老实了哎呀,这个课那么难教……” “别怕,哎呀,你哭什么,用不着哭,我下节课到窗口听听,找几个替你治一治谁闹得最厉害?谁听课比较好?” “都闹得厉害!就是陈辉和杨素瑶还没有跟着起哄” “啊,你别叫他们骗了,那两个最复杂!估计背地里捣鬼的就是他们!你别怕……今天晚上我有两张体育馆的球票,你去吗?……”我听得怒火中烧,姓孙的,你平白无故地污蔑老子!好,你等着瞧! 好,第三节课又乱了堂我根本就没听,眼睛直盯着窗外不一会就看见窗台上露出一个脑瓢,一圈头发孙主任来了他偷听了半天,猛地把头从窗户里伸上来,大叫:“刘小军!张明!陈辉!杨素瑶!到教导处去!” 刘小军和张明吓得面如土色可是我坦然地站起来\n-------\n\n\n\nHere\'s a JSON schema to follow:\n{{"$defs": {{"Entity": {{"properties": {{"type": {{"description": "Entity in a knowledge graph. Only extract entities with types that are listed as valid: typing.Literal[\'\u4eba\u7269\', \'\u5730\u70b9\', \'\u7ec4\u7ec7\', \'\u7269\u54c1\', \'\u4e8b\u4ef6\']", "enum": ["\u4eba\u7269", "\u5730\u70b9", "\u7ec4\u7ec7", "\u7269\u54c1", "\u4e8b\u4ef6"], "title": "Type", "type": "string"}}, "name": {{"title": "Name", "type": "string"}}}}, "required": ["type", "name"], "title": "Entity", "type": "object"}}, "Relation": {{"properties": {{"type": {{"description": "Relation in a knowledge graph. Only extract relations with types that are listed as valid: typing.Literal[\'\u5c45\u4f4f\u4e8e\', \'\u5de5\u4f5c\u4e8e\', \'\u62e5\u6709\', \'\u9047\u89c1\', \'\u654c\u5bf9\', \'\u670b\u53cb\', \'\u7231\u6155\', \'\u53c2\u4e0e\', \'\u53d1\u751f\u5728\', \'\u4f7f\u7528\']", "enum": ["\u5c45\u4f4f\u4e8e", "\u5de5\u4f5c\u4e8e", "\u62e5\u6709", "\u9047\u89c1", "\u654c\u5bf9", "\u670b\u53cb", "\u7231\u6155", "\u53c2\u4e0e", "\u53d1\u751f\u5728", "\u4f7f\u7528"], "title": "Type", "type": "string"}}}}, "required": ["type"], "title": "Relation", "type": "object"}}, "Triplet": {{"properties": {{"subject": {{"$ref": "#/$defs/Entity"}}, "relation": {{"$ref": "#/$defs/Relation"}}, "object": {{"$ref": "#/$defs/Entity"}}}}, "required": ["subject", "relation", "object"], "title": "Triplet", "type": "object"}}}}, "description": "Knowledge Graph Schema.", "properties": {{"triplets": {{"items": {{"$ref": "#/$defs/Triplet"}}, "title": "Triplets", "type": "array"}}}}, "required": ["triplets"], "title": "KGSchema", "type": "object"}}\n\nOutput a valid JSON object but do not repeat the schema.\n'}]}} DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): dashscope.aliyuncs.com:443 Starting new HTTPS connection (1): dashscope.aliyuncs.com:443 DEBUG:urllib3.connectionpool:https://dashscope.aliyuncs.com:443 "POST /api/v1/services/aigc/text-generation/generation HTTP/11" 200 None https://dashscope.aliyuncs.com:443 "POST /api/v1/services/aigc/text-generation/generation HTTP/11" 200 None DEBUG:dashscope:Response: {'output': {'choices': [{'finish_reason': 'stop', 'message': {'role': 'assistant', 'content': '{\n "triplets": [\n {\n "subject": {"type": "实体", "name": "刘老师"},\n "relation": {"type": "参与"},\n "object": {"type": "实体", "name": "教导处"}\n },\n {\n "subject": {"type": "实体", "name": "孙主任"},\n "relation": {"type": "询问"},\n "object": {"type": "实体", "name": "小刘"}\n },\n {\n "subject": {"type": "实体", "name": "孙主任"},\n "relation": {"type": "命令"},\n "object": {"type": "实体", "name": "刘小军"}\n },\n {\n "subject": {"type": "实体", "name": "孙主任"},\n "relation": {"type": "命令"},\n "object": {"type": "实体", "name": "张明"}\n },\n {\n "subject": {"type": "实体", "name": "孙主任"},\n "relation": {"type": "命令"},\n "object": {"type": "实体", "name": "陈辉"}\n },\n {\n "subject": {"type": "实体", "name": "孙主任"},\n "relation": {"type": "命令"},\n "object": {"type": "实体", "name": "杨素瑶"}\n },\n {\n "subject": {"type": "实体", "name": "我"},\n "relation": {"type": "听到"},\n "object": {"type": "实体", "name": "对话"}\n },\n {\n "subject": {"type": "实体", "name": "我"},\n "relation": {"type": "愤怒"},\n "object": {"type": "实体", "name": "孙主任"}\n },\n {\n "subject": {"type": "实体", "name": "我"},\n "relation": {"type": "站起来"},\n "object": {"type": "实体", "name": "教室"}\n },\n {\n "subject": {"type": "实体", "name": "我"},\n "relation": {"type": "观察"},\n "object": {"type": "实体", "name": "窗外"}\n }\n ]\n}'}}]}, 'usage': {'total_tokens': 1615, 'output_tokens': 485, 'input_tokens': 1130}, 'request_id': '01782f35-cfcb-9cb7-a14c-0e1f2066376d'} Response: {'output': {'choices': [{'finish_reason': 'stop', 'message': {'role': 'assistant', 'content': '{\n "triplets": [\n {\n "subject": {"type": "实体", "name": "刘老师"},\n "relation": {"type": "参与"},\n "object": {"type": "实体", "name": "教导处"}\n },\n {\n "subject": {"type": "实体", "name": "孙主任"},\n "relation": {"type": "询问"},\n "object": {"type": "实体", "name": "小刘"}\n },\n {\n "subject": {"type": "实体", "name": "孙主任"},\n "relation": {"type": "命令"},\n "object": {"type": "实体", "name": "刘小军"}\n },\n {\n "subject": {"type": "实体", "name": "孙主任"},\n "relation": {"type": "命令"},\n "object": {"type": "实体", "name": "张明"}\n },\n {\n "subject": {"type": "实体", "name": "孙主任"},\n "relation": {"type": "命令"},\n "object": {"type": "实体", "name": "陈辉"}\n },\n {\n "subject": {"type": "实体", "name": "孙主任"},\n "relation": {"type": "命令"},\n "object": {"type": "实体", "name": "杨素瑶"}\n },\n {\n "subject": {"type": "实体", "name": "我"},\n "relation": {"type": "听到"},\n "object": {"type": "实体", "name": "对话"}\n },\n {\n "subject": {"type": "实体", "name": "我"},\n "relation": {"type": "愤怒"},\n "object": {"type": "实体", "name": "孙主任"}\n },\n {\n "subject": {"type": "实体", "name": "我"},\n "relation": {"type": "站起来"},\n "object": {"type": "实体", "name": "教室"}\n },\n {\n "subject": {"type": "实体", "name": "我"},\n "relation": {"type": "观察"},\n "object": {"type": "实体", "name": "窗外"}\n }\n ]\n}'}}]}, 'usage': {'total_tokens': 1615, 'output_tokens': 485, 'input_tokens': 1130}, 'request_id': '01782f35-cfcb-9cb7-a14c-0e1f2066376d'}
To ensure that the entities and relationships in Neo4j have the correct labels when using the Neo4jPropertyGraphStore
, you need to explicitly set the labels for your nodes and relationships. Here is how you can do it:
Initialize the Neo4jPropertyGraphStore
:
from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore
pg_store = Neo4jPropertyGraphStore(
username="neo4j",
password="llamaindex",
url="bolt://localhost:7687",
)
Create and insert nodes and relationships with appropriate labels:
from llama_index.core.graph_stores.types import EntityNode, Relation
# Create entity nodes with appropriate labels
entity1 = EntityNode(label="PERSON", name="刘老师", properties={"type": "实体"})
entity2 = EntityNode(label="LOCATION", name="教导处", properties={"type": "实体"})
# Create a relation with the correct label
relation = Relation(
label="参与",
source_id=entity1.id,
target_id=entity2.id,
properties={"type": "参与"},
)
# Insert nodes and relations into the graph store
pg_store.upsert_nodes([entity1, entity2])
pg_store.upsert_relations([relation])
Optionally, insert text chunks and their relationships:
from llama_index.core.schema import TextNode
source_node = TextNode(text="刘老师参与了教导处的活动。")
relations = [
Relation(
label="MENTIONS",
target_id=entity1.id,
source_id=source_node.node_id,
),
Relation(
label="MENTIONS",
target_id=entity2.id,
source_id=source_node.node_id,
),
]
pg_store.upsert_llama_nodes([source_node])
pg_store.upsert_relations(relations)
Construct the PropertyGraphIndex
from the existing graph store:
from llama_index.core import PropertyGraphIndex
index = PropertyGraphIndex.from_existing(
property_graph_store=pg_store
)
By following these steps, you ensure that the entities and relationships in Neo4j have the correct labels and are correctly inserted into the Neo4jPropertyGraphStore
. You can then construct the PropertyGraphIndex
from the existing graph store, allowing you to leverage the parsed nodes directly [1][2].
To continue talking to Dosu, mention @dosu.
code: index = PropertyGraphIndex.from_documents( documents, kg_extractors=[kg_extractor], llm=MySetting.my_llm_mode(), embed_model=MySetting.my_embedding_mode(), show_progress=True, property_graph_store=graph_store )
console output: Parsing nodes: 100%|██████████| 5/5 [00:00<00:00, 4999.17it/s] Extracting paths from text with schema: 100%|██████████| 5/5 [00:12<00:00, 2.44s/it] Generating embeddings: 100%|██████████| 1/1 [00:01<00:00, 1.69s/it] Generating embeddings: 100%|██████████| 1/1 [00:02<00:00, 2.45s/it]
So that's what I'm doing building the propertyindex, my code and my console output, what's going on at each step? And I also want to know that my property_graph_store is specified as neo4j, but where is the embedding located?@dosu
To understand the steps involved in building the PropertyGraphIndex
from documents, you can follow the process outlined in your code and console output:
Load Documents:
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
Construct the Property Graph Index:
from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
index = PropertyGraphIndex.from_documents(
documents,
llm=MySetting.my_llm_mode(),
embed_model=MySetting.my_embedding_mode(),
show_progress=True,
property_graph_store=graph_store
)
Steps Involved:
Regarding the location of embeddings when your property_graph_store
is specified as Neo4j, the embeddings are not stored in Neo4j by default. Instead, you can specify a separate vector store for embeddings. For example:
from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb
client = chromadb.PersistentClient("./chroma_db")
collection = client.get_or_create_collection("my_graph_vector_db")
graph_store = Neo4jPropertyGraphStore(
username="neo4j",
password="<password>",
url="bolt://localhost:7687",
)
index = PropertyGraphIndex.from_documents(
documents,
embed_model=MySetting.my_embedding_mode(),
property_graph_store=graph_store,
vector_store=ChromaVectorStore(collection=collection),
show_progress=True,
)
In this setup, the embeddings are stored in the ChromaVectorStore
, while the property graph is managed by the Neo4jPropertyGraphStore
[1][2].
To continue talking to Dosu, mention @dosu.
index = PropertyGraphIndex.from_documents( documents, kg_extractors=[kg_extractor], llm=MySetting.my_llm_mode(), embed_model=MySetting.my_embedding_mode(), show_progress=True,
) index.storage_context.persist("../storage/lmsg")
as up i saved the index,when i load it by using " storage_context = StorageContext.from_defaults(persist_dir="../storage/lmsg") index = load_index_from_storage(storage_context)", why python give me the error as follow: raise ValueError(f"Could not infer node type for data: {node_dict!s}") ValueError: Could not infer node type for data: {'label': 'text_chunk', 'embedding': [-0.01520478549412717, -0.01857941369606562, -0.0032608766895135576, -0.020778609602946856, -0.0658747647509483, -0.011741683893636027, -0.028336765650734095, -0.023129474193061283, 0.010250275175176338, 0.028488434333967284, -0.010168121305091695, -0.035591584332054955, 0.012026062674698256, -0.03346822276679031, -0.03068763024084852, 0.010174440833559744, -0.010964381892065935, 0.028033428284267716, -0.0021707580287750137, 0.016354939675312184, -0.010079647906539, -0.03892829536318511, -0.013359483181456707, 0.014054631312942156, -0.029701783799832793, -0.01774523593828308, -0.0077603809587648235, -0.02460824385458487, 0.026036457288364064, 0.00927074826262866, -0.03799300514991378, 0.0175303719703694, 0.028109262625884313, -0.029575393230471803, -0.019704289763378437, 0.013827128288092372, -0.057532987173122926, 0.028892884155922453, 0.003022314489844688, 0.007583434161659437, -4.384172874709362e-05, 0.012032382203166306, -0.008265943236208786, 0.007621351332467734, 0.023635036470505244, -0.06840257613816812, -0.008291221350080985, 0.0350354658268666, 0.030535961557615333, 0.03581908735690474, 0.023458089673399858, -0.04893842845657556, -0.00417088878891269, 0.013902962629708967, -0.03415073184133966, 0.005175693815332566, -0.02460824385458487, 0.00040563473354292924, 0.034883797143633405, 0.014762418501363703, -0.024292267431182395, -0.018870112005595897, -0.004689090123292752, -0.017998017077005064, -0.0036400483975965294, -0.0008847339855269342, 0.0030744505997060967, -0.010705281224875905, 0.0014045152020240083, -0.014598110761194416, -0.029221499636261028, -0.03779078023893619, 0.0006714498997302626, 0.011602654267338938, 0.019059697859637383, -0.004047657983785724, 0.0022766101306148433, -0.03106680194893149, -0.01898386351802079, 0.037866614580552786, -0.018958585404148593, 0.024279628374246295, 0.017719957824410885, 0.005889800532222163, 0.03541463753494957, 0.017163839319222524, 0.00024626412498930515, -0.000623263495161385, -0.012228287585675842, 0.06390307186891685, 0.006932522729450335, 0.0022386929598065463, -0.02757842223456815, -0.029044552839155642, 0.042644178102398234, 0.008291221350080985, 0.0062531734191350104, -0.017606206311985992, 0.04049553842326139, 0.006136262142476094, -0.05030344660567426, -0.02674424447678561, -0.025088528018156635, -0.0014258436106036752, 0.03056123967148753, 0.004512143326187365, -0.04388280568213594, 0.022560716630936825, -0.027148694298740783, -0.0028501073390903384, 0.013005589587245934, -0.019173449372062275, 0.03998997614581743, -0.012740169391587853, 0.021815012271706978, 0.02542978255543131, -0.007747741901828724, 0.007172664811236217, -0.04145610675040492, -0.016367578732248284, -0.0008365475809580566, 0.024595604797648773, 0.006932522729450335, -0.02005818335758921, 0.011128689632235223, -0.019881236560483823, -0.03192625782058623, -0.001151734063302027, 0.041076935042321946, -0.003497859007065415, -0.0031660837624928146, -0.057836324539589305, -0.013410039409201104, 0.010281872817516587, -0.012828642790140546, 0.01696161440824494, 0.015432288518976954, -0.020197212983886298, 0.008196428423060242, 0.0044489480415068694, -0.028488434333967284, -0.00450898356195334, -0.030839298924081708, 0.0016699353976820884, 0.03377156013325669, 0.023104196079189083, 0.027679534690056944, 4.152127688773168e-05, 0.06567253983997072, 0.01626646627675949, -0.045854498564167394, -0.04067248522036678, -0.04024275728453941, -0.0029780777905683412, 0.00031518648234397033, -0.011514180868786245, -0.03056123967148753, -0.00027154223886150327, -0.01627910533369559, 0.014800335672172, 0.01964109447869794, -0.028185096967500906, 0.003617930047958356, -0.003892829536318511, -0.008107955024507547, 0.011255080201596214, 0.006616546306047858, -0.019906514674356023, 0.01077479603802445, -0.013258370725967914, 0.016215910049015094, 0.04820536315428182, 0.020942917343116145, -0.0018847993655957726, 0.026061735402236264, -0.007235860095916712, -0.019830680332739426, 0.023925734780035523, -0.00902428665237473, -0.030232624191148954, 0.015571318145274043, -0.03341766653904592, -0.01804857330474946, -0.019451508624656454, -0.008727268814376401, 0.004107693504232195, 0.012689613163843457, 0.0003001776022323527, -0.007292735852129158, -4.7001492981118383e-05, 0.08205275762915511, 0.0028185096967500906, 0.00794996681280631, -0.007286416323661109, -0.0017204916254264847, -0.032077926503819416, -0.014876170013788595, -0.0013879264397953781, -0.01656980364322587, 0.00027114726833225015, -0.014901448127660793, -0.019438869567720354, 0.00846184861871832, -0.01854149652525732, -0.0007615031803999685, -0.04752285407973247, -0.01520478549412717, -0.012013423617762158, 0.008126913609911697, -0.013574347149370392, -0.0073559311368096535, 0.04016692294292282, -0.030990967607314897, -0.009454014588202098, 0.007861493414253616, 0.07482321706170644, -0.03597075604013793, -0.019957070902100416, -0.01507839492476618, -0.017302868945519613, 0.017024809692925434, -0.04411030870698572, 0.006294250354177333, 0.05171902098251736, 0.0010016452621858505, -0.008708310228972253, -0.006117303557071945, 0.004243563366295259, 0.0036811253326388517, 0.007507599820042842, -0.027628978462312548, 0.023331699104038865, 0.029575393230471803, 0.013485873750817699, 0.006284771061475258, -0.009668878556115782, -0.04127915995329953, -0.042315562622059655, -0.0066607830053242055, 0.004246723130529284, 0.012967672416437637, -0.007343292079873554, -0.010420902443813676, 0.003962344349467056, -0.010970701420533985, 0.03943385764062907, 0.013094062985798627, 0.018073851418621657, -0.01964109447869794, -0.011912311162273365, 0.021157781311029827, -0.008841020326801292, 0.021157781311029827, -0.0005193862459678208, -0.05813966190605568, 0.05093539945247921, -0.020500550350352677, -0.0161147975935263, 0.004246723130529284, 0.0011343553600148908, -0.022270018321406546, 0.014446442077961226, 0.03164819856799205, 0.0336704476777679, -0.012424192968185376, -0.016190631935142898, 0.018667887094618314, 0.0016367578732248284, -0.027426753551334962, 0.01318253638435132, -0.02142320150668791, -0.012664335049971259, -0.012910796660225191, 0.01707536592066983, -0.05601630034079104, -0.02446921422828778, 0.026718966362913415, -0.028741215472689263, -0.036501596431454085, 0.044211421162474514, -0.022927249282083697, -0.04994955301146349, 0.0268959131600188, -0.0016170093467621736, 0.021473757734432306, 0.021840290385579178, -0.01626646627675949, -0.00039497052925309564, 0.030182067963404557, 0.008423931447910024, -0.09494459570397615, 0.03137013931539787, 0.0008792043981173909, -0.033948506930362075, 0.018351910671215835, 0.03549047187656616, 0.0457786642225508, -0.0022513320167426455, 0.06921147578207845, 0.03859967988284653, -0.013561708092434293, -0.029802896255321586, 0.018895390119468097, 0.008316499463953183, 0.01911025408738178, 0.00915067722173572, 0.010692642167939805, 6.684876207608643e-05, -0.002707917948559224, 0.0021454799149028154, 0.010610488297855162, 0.006294250354177333, 0.004344675821784052, 0.0087525469282486, 0.021208337538774224, 0.004967149375886931, -0.0040602970407218235, 0.020626940919713666, 0.005690735385478602, -0.02180237321477088, -0.019451508624656454, 0.013207814498223518, 0.008935813253822035, 0.014787696615235902, 0.018061212361685557, 0.03541463753494957, -0.007886771528125815, 0.004044498219551699, -0.0008610357537717485, 0.011084452932958877, -0.017960099906196767, -0.015937850796420915, -0.0037474803815533714, -0.04646117329710015, 0.040040532373561824, -0.006945161786386434, 0.0013602785027476613, -0.029499558888855207, -0.016076880422718005, 0.006016191101583153, 0.002881704981430586, -0.044565314756685286, -0.018086490475557757, 0.03725993984762003, 0.021499035848304503, 0.02798287205652332, -0.007501280291574793, -0.02851371244783948, 0.024658800082329266, 0.017454537628752802, -0.031117358176675887, -0.012070299373974603, 0.04251778753303724, -0.03384739447487328, 0.018743721436234907, -0.06097081065974187, 0.02127153282345472, 0.018516218411385125, -0.008322818992421231, -0.025202279530581528, -0.029398446433366417, 0.05144096172992318, -0.03827106440250796, -0.028260931309117502, -0.030384292874382143, 0.0031739831730778767, -0.006989398485662781, 0.000768217679397271, 0.02517700141670933, -0.06086969820425308, -0.022762941541914408, -0.013030867701118132, 0.04992427489759129, 0.008626156358887608, 0.008291221350080985, 0.03971191689322325, -0.0013871364987368719, 0.008683032115100055, 0.05854411172801085, -0.0364004839759653, -0.012727530334651755, -0.009100120993991325, 0.043933361909880335, -0.008101635496039499, 0.0005268906860236296, -0.049747328100485905, -0.01658244270016197, -0.014509637362641723, -0.030333736646637747, 0.0032229595187052607, 0.03695660248115366, 0.06420640923538323, -0.027831203373290134, 0.01629174439063169, -0.018124407646366053, -0.02813454073975651, -0.021954041898004067, 0.03389795070261768, 0.030864577037953908, 0.013801850174220174, 0.03182514536509744, -0.024292267431182395, 0.013485873750817699, -0.009333943547309157, -0.06076858574876429, -0.037537999100214214, 0.011223482559255966, -0.005226250043076962, -0.12264940850790529, 0.002434598342316082, -0.030611795899231926, -0.004679610830590678, -0.029297333977877624, -0.022687107200297815, -0.04021747917066721, -0.011785920592912374, 0.03554102810431056, -0.04345307774630857, -0.05080900888311823, -0.03458045977716703, -0.011463624641041848, -0.011198204445383768, -0.0025167522124007256, -0.00687564697323789, 0.013662820547923085, -0.013624903377114788, -0.014105187540686553, 0.005810806426371543, -0.013814489231156274, 0.021486396791368403, -0.007798298129573121, -0.00033750231724677024, -0.017024809692925434, 0.01548284474672135, -0.03313960728645174, 0.04244195319142065, 0.03215376084543601, -0.007716144259488477, -0.010711600753343955, 0.022965166452891993, -0.008190108894592192, -0.006679741590728354, 0.006812451688557394, 0.0031028884778123195, -0.028564268675583877, 0.009454014588202098, -8.496803510557221e-05, 0.0006414321395070274, 0.006534392435963214, 0.005229409807310986, -0.05596574411304664, -0.01386504545890067, 0.03781605835280839, 0.022434326061575832, -0.035237690737844184, -0.012367317211972931, 0.022788219655786607, -0.007766700487232873, 0.016519247415481473, 0.0035768531129160342, -0.011311955957808659, -0.032330707642541395, 0.016329661561439987, -0.009239150620288414, -0.015192146437191072, 0.008449209561782222, 0.01088222802198129, -0.0066544634768561555, -0.003873870950914362, 0.011962867390017761, 0.04989899678371909, -0.012594820236822714, 0.00383911354434009, -0.026870635046146604, -0.05091012133860702, -0.009732073840796277, 0.010939103778193737, 0.029145665294644435, 0.007747741901828724, 0.024810468765562456, -0.02409004252020481, 0.04476753966766287, -0.0175303719703694, 0.026011179174491868, -0.012809684204736398, -0.006016191101583153, -0.0014527016065928858, 0.03392322881648988, -0.004603776488974083, -0.003415705136980771, -0.013763933003411877, -0.009915340166369713, 0.04403447436536913, 0.014964643412341288, 0.03513657828235539, 0.006291090589943308, -0.00017418200340061518, 0.003050752367950911, -0.017998017077005064, 0.0005146465996167836, 0.01199446503235801, 0.012367317211972931, -0.013549069035498193, -0.010149162719687545, -0.019059697859637383, 0.006164700020582317, -0.000180600274500978, 0.03670382134243167, 0.0470425699161607, 0.02896871849753905, -0.00902428665237473, 0.002080704748105308, 0.03609714660949892, 0.057482430945378526, -0.02259863380174512, 0.009112760050927423, 0.012721210806183705, -0.07573322916110557, -0.02366031458437744, -0.011090772461426927, -0.012664335049971259, 0.04024275728453941, -0.0107811155664925, -0.01563451342995454, 0.05035400283341866, 0.006483836208218819, -0.028993996611411246, 0.02436810177279899, -0.011627932381211136, -0.012032382203166306, -0.017960099906196767, -0.0012749648684289927, 0.007874132471189715, 0.03349350088066251, -0.045854498564167394, 0.0028690659244944867, 0.0002883284863547598, 0.033999063158106475, -0.012455790610525624, -0.022257379264470446, 0.013410039409201104, 0.0036937643895749504, -0.004701729180228851, -0.0025910066719003074, 0.04636006084161136, -0.016645637984842462, 0.015685069657698936, -0.050151777922441076, 0.013523790921625995, 0.013763933003411877, -0.018908029176404197, -0.037917170808297186, 0.05935301137192119, -0.021069307912477134, -0.021081946969413234, -0.02755314412069595, 0.01857941369606562, 0.025619368409472796, 0.009390819303521602, 0.0069641203717905826, 0.009852144881689218, -0.018250798215727043, 0.04615783593063377, 0.020134017699205805, -0.02072805337520246, 0.018958585404148593, 0.02588478860513088, -0.013271009782904014, 0.016076880422718005, 0.023849900438418926, 0.0255814512386645, 0.045323658172851236, -0.04289695924112021, 0.031420695543142266, 0.05123873681894559, -0.020500550350352677, 0.027426753551334962, -0.031167914404420283, -0.030384292874382143, -0.0018200241987982647, 0.029221499636261028, -0.033645169563895704, 0.025101167075092735, 0.0052515281569491595, -0.03569269678754375, -0.016241188162887294, -0.011387790299425254, 0.018655248037682214, -0.032052648389947216, -0.03025790230502115, -0.03346822276679031, 0.011425707470233552, -0.035591584332054955, -4.8803546020835635e-05, 0.006133102378242069, -0.008265943236208786, 0.023458089673399858, 0.03109208006280369, 0.022522799460128525, -0.034631016004911426, -0.026820078818402208, -0.04802841635717643, -0.007975244926678508, -0.01629174439063169, 0.015963128910293115, 0.01644341307386488, -0.016493969301609273, -0.020930278286180045, 0.01774523593828308, -0.024557687626840477, -0.005204131693438788, 0.027527866006823755, -0.029044552839155642, -0.025556173124792303, -0.013207814498223518, -0.0032829950391517313, 0.012746488920055903, 0.0322548733009248, -0.013321566010648411, -0.012531624952142219, -0.02033624261018339, 2.1192637460236415e-05, 0.013763933003411877, 0.007248499152852811, 0.008720949285908351, -0.06501530887929358, -0.007766700487232873, 0.06370084695793926, -0.00383911354434009, -0.01736606423020011, -0.011918630690741415, 0.0005434794482522597, 0.0017789472637559428, -0.013801850174220174, -0.00659126819217566, 0.0033461903238322264, 0.011634251909679186, -0.02089236111537175, 0.0004328877000613928, -0.006085705914731698, -0.025821593320450382, -0.06941370069305604, -0.010951742835129835, -0.01773259688134698, 0.005140936408758293, -0.018781638607043204, 0.033316554083557125, 0.021309449994263017, -0.020247769211630694, 0.014193660939239246, -0.028311487536861895, -0.028488434333967284, 0.02111986414022153, -0.013549069035498193, -0.004774403757611421, -0.01965373353563404, 0.014888809070724695, 0.04797786012943204, -0.01774523593828308, -0.02489894216411515, 0.008784144570588848, 0.0036779655684048267, 0.0033051133887899045, 0.023875178552291126, 0.020803887716819056, -0.04676451066356652, 0.0201719348700141, -0.045601717425445415, 0.016632998927906366, 0.03447934732167824, 0.017998017077005064, -0.007551836519319189, 0.038169951947019165, -0.00071450168741885, -0.015798821170123826, -0.008386014277101728, -0.012051340788570454, 0.03149652988475886, -0.04011636671517842, 0.006114143792837921, -0.008525043903398817, 0.0024472373992521805, 0.022901971168211497, 0.01721439554696692, -0.003892829536318511, 0.03576853112916034, 0.00928970684803281, -0.027275084868101773, -0.005213610986140863, -0.024304906488118495, 0.01128667784393646, -0.030384292874382143, 0.034631016004911426, -0.030106233621787964, 0.008044759739827052, 0.0013073524518277466, 0.0076592685032760306, -0.03344294465291812, -0.025669924637217192, 0.015963128910293115, -0.029727061913704993, -0.007861493414253616, 0.024014208178588216, 0.01198814550388996, -0.030384292874382143, 0.027603700348440348, 0.0018073851418621658, -0.0792721651032133, -0.007229540567448663, 0.04734590728262708, -0.0376896677834474, -0.02676952259065781, -0.019603177307889644, -0.024380740829735088, 0.021208337538774224, 0.008733588342844451, 0.017707318767474785, -0.004790202578781544, 0.009068523351651077, 0.012323080512696585, -0.018364549728151935, -0.01170376672282773, 0.0018563614874895495, -0.013536429978562095, 0.014825613786044198, -0.0027110777127932486, 0.019009141631892986, -0.06638032702839226, -0.016228549105951194, -0.011337234071680857, 0.006313208939581481, 0.006774534517749097, 0.010022772150326556, 0.016531886472417573, -0.013549069035498193, -0.013144619213543023, -0.015546040031401845, -0.005731812320520924, -0.0619313789868854, 0.018592052753001718, -0.014370607736344632, -0.03283626991998536, 0.004069776333423898, 0.009700476198456029, 0.03837217685799675, -0.034302400524572854, 0.03731049607536443, 0.011280358315468413, -0.012695932692311507, -0.0018942786582978467, 0.004202486431252938, 0.042062781483337676, -0.006509114322091016, -0.015937850796420915, 0.03892829536318511, 0.05829133058928887, -0.010275553289048537, -0.01868052615155441, -0.015293258892679865, 0.0067682149892810475, -0.014218939053111444, 0.02436810177279899, 0.026794800704530008, -0.024431297057479484, 0.023887817609227226, -0.006278451533007209, 0.007318013966001356, 0.015647152486890636, 0.018756360493171007, 0.037209383619875636, 0.05257847685417209, 0.006698700176132502, -0.031041523835059294, -0.005592782694223835, -0.09575349534788649, 0.0349849095991222, -0.028715937358817067, 0.02249752134625633, -0.02258599474480902, 0.011646890966615284, 0.00955512704369089, -0.02851371244783948, 0.022434326061575832, -0.017682040653602585, -0.029272055864005424, 0.035894921698521334, 0.01994443184516432, -0.010844310851172994, 0.010281872817516587, -0.03109208006280369, -0.015596596259146242, 0.07912049641998012, -0.01899650257495689, 0.012512666366738071, 0.004012900577211452, -0.03576853112916034, 0.021574870189921096, 0.013713376775667481, 0.021612787360729396, 0.034934353371377805, 0.0024282788138480322, 0.03147125177088666, -0.015596596259146242, 0.007937327755870211, 0.001218089112216547, 0.013359483181456707, -0.021056668855541034, 0.023344338160974965, -0.009991174507986308, -0.053235707814849244, -0.011261399730064263, 0.0242037940326297, 0.013119341099670825, 0.027654256576184744, 0.013637542434050886, 0.019539982023209147, 0.041683609775254704, -0.014332690565536335, 0.017050087806797634, -0.04115276938393854, 0.008246984650804636, 0.029373168319494217, 0.00016786247493256565, 0.02932261209174982, 0.029676505685960596, 0.02997984305242697, -0.007526558405446991, -0.007779339544168972, -0.034302400524572854, 0.02141056244975181, 0.0029986162580895024, 0.02798287205652332, -0.004012900577211452, 0.022649190029489518, 0.02932261209174982, -7.415571686726871e-05, 0.007785659072637022, 0.059403567599665585, -0.007848854357317516, -0.0068630079163017905, 0.03200209216220282, -0.00272529665184636, -0.026289238427086047, -0.0038327940158720403, -0.005924557938796435, -0.02170126075928209, 0.025113806132028835, 0.024696717253137566, 0.0007666377972802587, 0.02170126075928209, 0.034883797143633405, -0.014421163964089028, 0.00955512704369089, 0.0364004839759653, 0.01560923531608234, -0.005798167369435444, -0.032987938603218546, 0.022800858712722707, 0.035894921698521334, -0.012841281847076646, -0.013511151864689897, 0.008923174196885937, -0.004313078179443804, 0.005501149531437116, 0.009890062052497515, 0.01037034621606928, -0.003756959674255446, -0.00639852257390015, 0.04600616724740058, -0.011634251909679186, -0.0026241841963575677, 0.012468429667461724, 0.04830647560977061, -0.04357946831566956, 0.04891315034270337, 0.010673683582535657, 0.016759389497267355, 0.00279165170076088, 0.037841336466680586, 0.0020949236871584194, 0.055915187885302246, 0.03447934732167824, 0.03134486120152567, -0.06643088325613666, -0.05667353130146819, -0.001622538934171717, -0.04694145746067191, 0.0525279206264277, 0.025682563694153292, 0.01560923531608234, -0.01773259688134698, -0.041001100700705354, -0.056875756212445776, -0.023900456666163323, -0.01855413558219342, 0.005409516368650398, -0.00017763799553157977, -0.030535961557615333, -0.00727377726672501, -0.02608701351610846, 0.0017252312717775217, 0.010243955646708288, -0.00598775322347693, 0.009302345904968909, -0.009751032426200425, 0.011684808137423583, 0.010806393680364698, -0.015495483803657449, 0.0017694679710538684, 0.03693132436728146, -0.022623911915617318, -0.02072805337520246, 0.010989660005938134, -0.004268841480167458, -0.044236699276346714, 0.005308403913161606, -0.004132971618104393, 0.02142320150668791, -0.017037448749861534, 0.014598110761194416, -0.020399437894863884, 0.023255864762422272, 0.007084191412683524, 0.011242441144660114, -0.03192625782058623, -0.014737140387491505, 0.024165876861821405, 0.027148694298740783, 0.02919622152238883, 0.029550115116599603, 0.009258109205692562, 0.01263905693609906, -0.03417600995521186, 0.015040477753957882, -0.004429989456102721, 0.02029832543937509, 0.013801850174220174, 0.029676505685960596, 0.030839298924081708, -0.024241711203437998, 0.04696673557454411, 0.01641813495999268, -0.03862495799671873, -0.0377149458973196, 0.010654724997131508, -0.011861754934528968, -0.027831203373290134, 0.0005482190946032967, -0.015457566632849152, 0.06511642133478236, 0.008739907871312501, 0.014876170013788595, -0.01927456182755107, -0.008228026065400488, -0.022118349638173357, 0.010142843191219497, 0.03190097970671403, 0.044893930237023864, -0.024052125349396512, -0.02398893006471602, -0.01507839492476618, 0.012752808448523953, -0.01962845542176184, -0.004022379869913526, 0.022699746257233915, 0.031193192518292483, -0.004806001399951668, 0.007185303868172316, 0.013902962629708967, -0.004328877000613928, -0.027780647145545737, -0.0076592685032760306, 0.006919883672514236, 0.050733174541501634, 0.006234214833730862, -0.008032120682890954, 0.059049674005454814, -0.012411553911249278, -0.00261944455000653, -0.012127175130187049, -0.02732564109584617, 0.060414692154553515, -0.002137580504317754, 0.028185096967500906, 0.020285686382438994, 0.017429259514880606, -0.04335196529081978, -0.003785397552361669, 0.042593621874653834, -0.015381732291232558, 0.05530851315236949, -0.015267980778807666, -0.002395101289390772, -0.02087972205843565, 0.00836073616322953, -0.005785528312499345, 0.004012900577211452, 0.003145545294971654, 0.003662166747234703, -0.0005948256170551621, -0.04545004874221222, 0.002989136965387428, 0.0018816396013617478, 0.011021257648278382, -0.054853507102669924, -0.008373375220165628, 0.006837729802429592, -0.01088222802198129, 0.010698961696407855, -0.03571797490141595, -0.01062944688325931, 0.03933274518514028, 0.017201756490030824, 0.02302836173757249, 0.014433803021025128, 0.012165092300995346, 0.018377188785088035, -0.01104021623368253, -0.021966680954940167, 0.01239891485431318, -0.02654201956580803, -0.0322295951870526, 0.024835746879434656, -0.018870112005595897, 0.0006280031415124221, 0.02732564109584617, -0.044893930237023864, 0.019982349015972616, 0.03715882739213124, -0.027224528640357376, 0.04355419020179736, -0.009769991011604574, -0.048508700520748196, -0.008164830780719994, -0.0035800128771500592, 0.0006686851060254909, 0.023786705153738433, -0.018503579354449025, -0.023331699104038865, -0.04084943201747217, -0.025758398035769885, -0.00518517310803464, -0.003488379714363341, -0.022914610225147597, 0.004859717391930089, 0.011457305112573798, 0.04448948041506869, -0.012000784560826058, 0.03698188059502585, 0.02031096449631119, 0.004145610675040492, 0.0030981488314612824, 0.019855958446611626, -0.0228893321112754, 0.029524837002727407, -0.022851414940467104, 0.026643132021296822, 0.022838775883531004, 0.03887773913544071, 0.03162292045411985, -0.03511130016848319, 0.016531886472417573, -0.031142636290548087, 0.0003596206668849436, 0.015558679088337943, -0.05884744909447723, -0.0004787042814547519, -0.004534261675825538, 0.010313470459856833, 0.042644178102398234, 0.005918238410328385, -0.017985378020068964, -0.02838732187847849, -0.050126499808568876, -0.001602000466650556, 0.02990400871081038, 0.004565859318165786, -0.030510683443743133, 0.01802329519087726, 0.039787751234839845, -0.020273047325502894, -0.04416086493473012, 0.012531624952142219, 0.012670654578439309, 0.004685930359058727, 0.0028153499325160656, -0.0010956482481480874, 0.003289314567619781, -0.04800313824330423, -0.01658244270016197, 0.006092025443199748, 0.01454755453345002, 0.013978796971325562, 0.01991915373129212, 0.014724501330555407, 0.01088854755044934, 0.03837217685799675, -0.015091033981702279, -0.019451508624656454, -0.012847601375544696, 0.02636507276870264, 0.06789701386072415, -0.0004423669927634671, -0.05369071386454881, -0.01239891485431318, 0.03255821066739118, 0.015811460227059926, 0.023761427039866233, -0.003911788121722659, -0.02057638469196927, 0.0043762734641243, 0.01706272686373373, -0.007893091056593863, -0.033013216717090746, -0.021966680954940167, 0.00457533861086786, -0.018768999550107107, 0.03594547792626573, 0.00029030333900102533, -0.006825090745493493, 0.03452990354942263, -0.030990967607314897, 0.01521742455106327, -0.0018263437272663142, 0.013397400352265004, -0.009359221661181355, -0.009978535451050208, 0.015432288518976954, -0.0700203754259888, -0.025808954263514282, 0.05995968610485394, 0.040065810487434024, -0.03756327721408641, -0.0015838318223049136, -0.0022687107200297816, 0.011021257648278382, -0.03346822276679031, -0.021625426417665492, -0.022295296435278743, -0.016165353821270698, 0.024570326683776574, 0.03963608255160665, 0.03028318041889335, 0.04539949251446783, 0.014016714142133858, -0.012702252220779557, -0.025353948213814717, 0.013511151864689897, 0.021183059424902027, -0.03283626991998536, -0.057179093578912155, 0.0025325510335708493, -0.034277122410700654, -0.027224528640357376, -0.014206299996175344, -0.008897896083013739, 0.056825199984701376, -0.004591137432037984, 0.03270987935062437, 0.01577354305625163, -0.035212412623971984, -0.04120332561168294, -0.012708571749247605, -0.012026062674698256, -0.0016367578732248284, -0.018743721436234907, 0.054297388597481566, -0.01076215698108835, 0.017416620457944506, -0.03743688664472542, 0.02249752134625633, -0.01914817125819008, -0.003526296885171638, -0.02623868219934165, -0.002101243215626469, -0.0033872672588745483, -0.004186687610082814, -0.008550322017271015, -0.007336972551405505, 0.018010656133941164, -0.011716405779763829, -0.0430739060382256, 0.04186055657236009, -0.03781605835280839, 0.0005197812164970739, -0.02750258789295156, -0.002371403057635586, -0.03617298095111551, -0.0037601194384894706, 0.004799681871483618, 0.028185096967500906, 0.01577354305625163, 0.019565260137081347, 0.03801828326378597, -0.022952527395955893, -0.02919622152238883, 0.03296266048934635, 0.00032604817189843045, 0.013220453555159618, 0.028741215472689263, 0.01573562588544333, 0.01802329519087726, 0.0107684765095564, -0.007994203512082656, 0.022901971168211497, 0.00038628117760952755, -0.026820078818402208, -0.010958062363597885, 0.05353904518131562, -0.008853659383737392, 0.008676712586632005, -0.012847601375544696, -0.0228766930543393, 0.0016462371659269027, 0.009378180246585504, -0.008904215611481789, -0.053589601409060016, -0.006006711808881079, -0.002707917948559224, 0.011785920592912374, -0.007640309917871882, -0.014711862273619307, 0.048078972584920825, -0.04598088913352839, 0.02919622152238883, -0.010016452621858506, -0.03480796280201681, -0.011413068413297452, -0.002556249265326035, -0.004360474642954176, 0.005472711653330894, -0.0040571372764877985, -0.006540711964431264, 0.008676712586632005, -0.004777563521845445, 0.0032513973968114835, -0.00041945870206678757, -0.002821669460984115, 0.05419627614199277, -0.026491463338063632, 0.0038138354304678916, 0.009188594392544018, -0.01855413558219342, -0.04120332561168294, -0.022118349638173357, -0.009877422995561417, -0.013435317523073302, 0.007855173885785566, -0.017922182735388467, -0.01817496387411045, -0.030485405329870936, 0.023938373836971623, 0.046486451410972345, 0.03933274518514028, 0.011564737096530641, -0.011773281535976276, 0.03743688664472542, -0.02515172330283713, -0.014294773394728039, 0.012152453244059247, -0.022446965118511932, 0.009536168458286741, -0.010667364054067607, -0.012424192968185376, -0.01319517544128742, -0.016076880422718005, -0.025644646523344996, 0.03253293255351898, 0.00982054723934897, 0.010048050264198754, -0.0175303719703694, 0.0066607830053242055, -0.00524520862848111, -0.07325597400163016, -0.061880822759141, 0.02917094340851663, -0.013877684515836769, -0.03283626991998536, 0.0033335512668961273, 0.00011533139454190393, 0.05217402703221692, -0.034277122410700654, 0.0363246496343487, -0.010755837452620301, 0.0323054295286692, -0.039787751234839845, 0.03382211636100109, 0.04835703183751501, -0.048761481659470175, -0.009454014588202098, -0.04059665087875018, 0.016380217789184384, -0.029423724547238614, -0.010850630379641044, -0.02354656307195255, -0.0002827988989452165, 0.02047527223648048, 0.008790464099056898, -0.0014748199562310592, 0.01560923531608234, 0.0025372906799218864, -0.002093343805041407, 0.0027237167697293477, 0.02919622152238883, -0.05318515158710485, -0.00888525702607764, 0.02176445604396258, 0.011564737096530641, -0.022535438517064625, 0.005191492636502689, -0.01373865488953968, 0.004006581048743402, -0.018882751062531997, -0.048812037887214575, 0.023205308534677876, -0.027603700348440348, 0.044565314756685286, -0.014749779444427603, 0.028918162269794653, -0.008626156358887608, 0.016772028554203455, 0.01669619421258686, 0.05434794482522596, 0.03703243682277025, 0.009934298751773861, -0.024229072146501898, -0.030839298924081708, -0.005621220572330057, 0.03025790230502115, -0.01520478549412717, -0.010844310851172994, 0.029550115116599603, 0.02517700141670933, 0.01454755453345002, -0.01375129394647578, 0.032381263870285795, -0.06420640923538323, 0.025012693676540042, 0.03384739447487328, -0.017606206311985992, -0.00242037940326297, 0.019224005599806672, 0.017682040653602585, 0.009751032426200425, 0.03220431707318041, -0.0255940902956006, 0.0032798352749177063, 0.04767452276296566, 0.03854912365510214, -0.003365148909236375, 0.012955033359501537, 0.021751816987026485, 0.008657754001227857, -0.005507469059905166, -0.011090772461426927, -0.016759389497267355, -0.030055677394043568, -0.03579380924303254, -0.01117924585997962, -0.0037222022676811736, -0.04454003664281309, 0.020159295813078, -0.010123884605815347, 0.027628978462312548, 0.029651227572088396, 0.01560923531608234, -0.025682563694153292, 0.03485851902976121, 0.034883797143633405, 0.028766493586561463, 0.011539458982658443, -0.013877684515836769, 0.02785648148716233, 0.0066544634768561555, -0.021170420367965927, -0.02611229162998066, -0.026946469387763197, -0.03389795070261768, -0.05207291457672813, -0.025518255953984003, -0.02704758184325199, 0.008841020326801292, 0.01577354305625163, -0.0052515281569491595, 0.0008902635729364776, -0.01305614581499033, 0.020551106578097073, -0.027224528640357376, 0.012967672416437637, -0.0403438697400282, 0.001098018071323606, 0.011122370103767173, -0.04542477062834002, 0.006667102533792255, 0.04436308984570771, 0.0033683086734703997, -0.0050587825386736495, -0.010174440833559744, -0.006136262142476094, -0.006818771217025443, 0.0024061604642098586, -0.03382211636100109, -0.0021123023904455556, -0.0015151069502148749, 0.0080763573821673, -0.0032766755106836817, 0.046511729524844544, 0.012834962318608596, 0.03157236422637545, 0.012872879489416894, 0.0018358230199683886, -0.008986369481566432, -0.017163839319222524, 0.013397400352265004, -0.04504559892025705, 0.026491463338063632, -0.007166345282768168, 0.01980540221886723, 0.004967149375886931, -0.030207346077276757, 0.01598840702416531, -0.016342300618376087, -0.006926203200982285, -0.00030195496961399164, -0.02717397241261298, -0.022434326061575832, 0.03015678984953236, 0.008834700798333244, 0.015103673038638379, 0.004136131382338417, -0.005302084384693556, -0.04600616724740058, -0.018263437272663142, 0.009125399107863521, -0.022699746257233915, -0.031167914404420283, -0.011602654267338938, 0.007311694437533307, 0.008329138520889281, 0.024962137448795645, 0.015824099283996026, 0.01937567428303986, 0.0061520609636462175, 0.02395101289390772, 0.0030618115427699976, 0.004689090123292752, -0.008942132782290085, 0.010958062363597885, 0.00820906747999634, 0.011274038787000363, 0.03827106440250796, 0.0014811394846991088, -0.014206299996175344, 0.015394371348168656, 0.026440907110319236, -0.04797786012943204, -0.02170126075928209, 0.001932195829106144, -0.0188321948347876, 0.014446442077961226, 0.010907506135853489, -0.013271009782904014, -0.016898419123564445, 0.0028137700503990536, -0.004139291146572442, -0.024317545545054595, 0.010964381892065935, 0.004426829691868696, 0.016746750440331255, -0.05480295087492553, -0.00767190756021213, 0.025619368409472796, 0.024052125349396512, 0.0038801904793824117, 1.2700771081294857e-05, 0.019198727485934472, -0.010913825664321539, -0.021347367165071313, 0.009782630068540674, -0.05078373076924603, -0.012967672416437637, 0.03374628201938449, 0.01426949528085584, 0.02043735506567218, 0.05098595568022361, 0.03597075604013793, -0.026567297679680225, -0.008392333805569776, -0.019906514674356023, -0.007033635184939127, -0.025910066719003075, -0.03339238842517372, 0.04891315034270337, 0.0014029353199069958, -0.021890846613323574, 0.001655716458628977, -0.02493685933492345, 0.002200775788998249, -0.03126902685990908, 0.02409004252020481, -0.021220976595710324, 0.012089257959378753, 0.026036457288364064, -0.021334728108135213, 0.02666841013516902, 0.011261399730064263, -0.03918107650190709, -0.019679011649506237, 0.007368570193745753, -0.05027816849180206, 0.0067239782900047, 0.003769598731191545, 0.02180237321477088, 0.020791248659882956, 0.01668355515565076, -0.027249806754229576, 0.016809945725011752, 0.01507839492476618, 0.016064241365781908, 0.007703505202552378, 0.02838732187847849, 0.017580928198113795, -0.0457786642225508, 0.005820285719073617, 0.04737118539649928, 0.007419126421490149, 0.06274027863079575, -0.003974983406403155, -0.005934037231498509, -0.014926726241532991, 0.03660270888694288, 0.012992950530309834, 0.04400919625149693, -0.01118556538844767, -0.04805369447104863, 0.02636507276870264], 'properties': {'_nodecontent': '{"id": "d1d6fe21-88e2-48bf-8f64-e0c790a1caf8", "embedding": null, "metadata": {}, "excluded_embed_metadata_keys": [], "excluded_llm_metadata_keys": [], "relationships": {"1": {"node_id": "bc6a4f15-e9df-493c-b3a5-35184955b18a", "node_type": "4", "metadata": {}, "hash": "3ca00d262163ffaff959eb56eec10850e427ec38ec21d54f90adbc3364677126", "class_name": "RelatedNodeInfo"}}, "text": "", "mimetype": "text/plain", "start_char_idx": 0, "end_char_idx": 6, "text_template": "{metadata_str}\n\n{content}", "metadata_template": "{key}: {value}", "metadata_seperator": "\n", "class_name": "TextNode"}', '_node_type': 'TextNode', 'document_id': 'bc6a4f15-e9df-493c-b3a5-35184955b18a', 'doc_id': 'bc6a4f15-e9df-493c-b3a5-35184955b18a', 'ref_doc_id': 'bc6a4f15-e9df-493c-b3a5-35184955b18a'}}
@dosu
When I used PropertyGraphIndex before, the Chinese stored in the locally saved property_graph_store.json file would be saved with the beginning of \u, but now I use this method, garbled characters will appear when opened. And pycharm will tell me "The file was loaded in a wrong encoding:'UTF-8'" and I want to know what went wrong with the setting?@dosu
Question Validation
Question
"SemanticSplitterNodeParser" work in English context,but not work in Chinese.I want to know how to split the chinese content to make the "retrieve" more accurate.Please give me an idea. Thank you