run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.98k stars 4.79k forks source link

How to retrieve specific files? #13550

Open susht3 opened 2 months ago

susht3 commented 2 months ago

Question Validation

Question

I have multiple files,can i retrieve specify document nodes or all nodes? like this: nodes = recursive_retriever.retrieve(query, "xx_name") which xx_name is my file_name

dosubot[bot] commented 2 months ago

To retrieve specific document nodes or all nodes from multiple files by specifying a file name, you can use a custom retriever that extends the BaseRetriever class from the LlamaIndex library. This approach allows for both synchronous and asynchronous retrieval, enabling efficient data processing and flexibility in handling large datasets or real-time requirements.

Here's a basic example to guide you:

from llama_index.core.base.base_retriever import BaseRetriever
from llama_index.core.schema import QueryBundle, NodeWithScore

class MyRetriever(BaseRetriever):
    def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        # Your synchronous retrieval logic here
        pass

    async def _aretrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        # Your asynchronous retrieval logic here
        pass

# Initialize your retriever
my_retriever = MyRetriever()

# For synchronous retrieval
nodes = my_retriever.retrieve("Your query here")

# For asynchronous retrieval
nodes = await my_retriever.aretrieve("Your query here")

This setup allows you to customize the retrieval process, enabling you to specify file names or other parameters to fetch the desired nodes. Whether you choose synchronous or asynchronous methods depends on your application's specific needs, such as performance considerations or the requirement for real-time data processing.

For more detailed information and examples, you might want to explore the LlamaIndex repository and its documentation.

#### Details **Sources** - [llama-index-legacy/llama_index/legacy/retrievers/recursive_retriever.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/retrievers/recursive_retriever.py) - [docs/docs/examples/retrievers/recursive_retriever_nodes.ipynb](https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/retrievers/recursive_retriever_nodes.ipynb) - [llama-index-core/llama_index/core/base/base_retriever.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/base/base_retriever.py) - [llama-index-legacy/llama_index/legacy/core/base_retriever.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/core/base_retriever.py) - [llama-index-core/llama_index/core/schema.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/schema.py) - [llama-index-networks/llama_index/networks/network/retriever.py](https://github.com/run-llama/llama_index/blob/main/llama-index-networks/llama_index/networks/network/retriever.py)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

logan-markewich commented 2 months ago

Probably use metadata filtering, assuming the file name/path is in the metadata of nodes you inserted Here's an example with pinecone https://docs.llamaindex.ai/en/stable/examples/vector_stores/pinecone_metadata_filter/?h=metadata+filter

susht3 commented 2 months ago

Probably use metadata filtering, assuming the file name/path is in the metadata of nodes you inserted Here's an example with pinecone https://docs.llamaindex.ai/en/stable/examples/vector_stores/pinecone_metadata_filter/?h=metadata+filter

thank you,it looks simple and efficient,but it uses PINECONE_API_KEY,how to get this key? is it free?