run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.86k stars 5.09k forks source link

[Question]: How do i add downloadUrl from a OneDrive item to metadata when using OneDriveLoader #12439

Closed Paul4912 closed 2 months ago

Paul4912 commented 6 months ago

Question Validation

Question

I want to query an index made from OneDrive files and also return a link to the file in the response.

I am looking to use CitationQueryEngine which also returns the source's metadata. Here i can get filename and id but not the link to the onedrive file.

I read the code for dataloader and there is a property in the onedrive item's metadata which links to the file.

https://github.com/run-llama/llama_index/blob/9607a05a923ddf07deee86a56d386b42943ce381/llama-index-integrations/readers/llama-index-readers-microsoft-onedrive/llama_index/readers/microsoft_onedrive/base.py#L237

But this property is not extracted when i load the documents for indexing.

https://github.com/run-llama/llama_index/blob/9607a05a923ddf07deee86a56d386b42943ce381/llama-index-integrations/readers/llama-index-readers-microsoft-onedrive/llama_index/readers/microsoft_onedrive/base.py#L263

How do i obtain access to the downloadurl in metadata when i use a citationquery on my index generated from onedrive files? Do i need to modify source code of this library or is there better way?

dosubot[bot] commented 6 months ago

Hey @Paul4912! Good to see you back in the labyrinth of code. I hope you've been well since our last adventure.

Based on your description, it seems like the OneDrive download URL is not currently being extracted and stored in the LlamaIndex framework. To include this in the query response, you would need to modify the document loading process to extract this property from the OneDrive item's metadata and store it in the document's metadata.

Here's an example of how you might modify the OneDriveReader class to include the download URL in the document's metadata:

class OneDriveReader(BaseReader):
    ...
    def _load_document(self, item):
        ...
        # Assuming 'downloadUrl' is the key in the item's metadata that contains the download URL
        download_url = item.get('downloadUrl', None)
        if download_url:
            document.metadata['download_url'] = download_url
        ...

This modification will store the download URL in the document's metadata under the key download_url. You can then modify the _create_citation_nodes method in the CitationQueryEngine class to include this metadata in the query response, as described in the previous answer.

Please note that this is a simplified example and the actual implementation may vary depending on the structure of the OneDrive item's metadata and the design of the LlamaIndex framework. You should also handle potential errors and edge cases, such as the download URL not being available for certain items.

For more information on how to work with the LlamaIndex framework, you can refer to the LlamaIndex documentation.

Please let me know if you need further assistance.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.