run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.21k stars 5.16k forks source link

how to use metadata of node? #13637

Closed susht3 closed 1 month ago

susht3 commented 5 months ago

Question Validation

Question

i want to set some attributes for node,like this:

node.metadata = {"file_name": file_name, "module": module_name}
node.excluded_llm_metadata_keys = ["file_name"]
node.excluded_embed_metadata_keys = ["file_name"]

What role does metadata play? does it append "module" to origin text? i find the document:metadata - a dictionary of annotations that can be appended to the text.

dosubot[bot] commented 5 months ago

Hey @susht3! Great to see you again! Thanks for reaching out. I'll take a look at your issue and provide a full answer shortly.

logan-markewich commented 5 months ago

It's added to the text during embedddings and/or llm calls

See here https://docs.llamaindex.ai/en/stable/module_guides/loading/documents_and_nodes/usage_documents/?h=customizing#advanced-metadata-customization

dosubot[bot] commented 5 months ago

Metadata in the LlamaIndex framework provides additional context and annotations for nodes, influencing both the embedding process and LLM (Large Language Model) calls.

Role of Metadata

  1. Contextual Information: Metadata adds context about the document or node, such as file name, module name, etc.
  2. Customization and Filtering: You can control which metadata is included or excluded during embeddings and LLM calls using excluded_embed_metadata_keys and excluded_llm_metadata_keys.

Usage During Embeddings and LLM Calls

Embeddings

Metadata not excluded by excluded_embed_metadata_keys will be part of the content used to generate embeddings. For example:

node.metadata = {"file_name": file_name, "module": module_name}
node.excluded_embed_metadata_keys = ["file_name"]

Here, file_name is excluded from embeddings, but module is included.

LLM Calls

Similarly, metadata not excluded by excluded_llm_metadata_keys will be used in LLM prompts. For example:

node.metadata = {"file_name": file_name, "module": module_name}
node.excluded_llm_metadata_keys = ["file_name"]

Here, file_name is excluded from LLM prompts, but module is included.

Example from Documentation

Refer to the Advanced Metadata Customization section for more details and examples on customizing metadata for embeddings and LLM calls.

#### Details **Sources** - [llama-index-legacy/llama_index/legacy/core/llms/types.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/core/llms/types.py) - [llama-index-legacy/tests/extractors/test_metadata_extractor.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/tests/extractors/test_metadata_extractor.py) - [llama-index-legacy/llama_index/legacy/llms/portkey_utils.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/llms/portkey_utils.py) - [llama-index-core/llama_index/core/base/llms/types.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/base/llms/types.py) - [llama-index-legacy/llama_index/legacy/readers/file/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/readers/file/base.py) - [llama-index-core/llama_index/core/readers/file/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/readers/file/base.py) - [llama-index-legacy/llama_index/legacy/extractors/interface.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/extractors/interface.py) - [llama-index-core/llama_index/core/extractors/interface.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/extractors/interface.py) - [llama-index-core/tests/node_parser/metadata_extractor.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/tests/node_parser/metadata_extractor.py)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.