Closed Paul4912 closed 2 months ago
Hey @Paul4912! Good to see you back in the labyrinth of code. I hope you've been well since our last adventure.
Based on your description, it seems like the OneDrive download URL is not currently being extracted and stored in the LlamaIndex framework. To include this in the query response, you would need to modify the document loading process to extract this property from the OneDrive item's metadata and store it in the document's metadata.
Here's an example of how you might modify the OneDriveReader
class to include the download URL in the document's metadata:
class OneDriveReader(BaseReader):
...
def _load_document(self, item):
...
# Assuming 'downloadUrl' is the key in the item's metadata that contains the download URL
download_url = item.get('downloadUrl', None)
if download_url:
document.metadata['download_url'] = download_url
...
This modification will store the download URL in the document's metadata under the key download_url
. You can then modify the _create_citation_nodes
method in the CitationQueryEngine
class to include this metadata in the query response, as described in the previous answer.
Please note that this is a simplified example and the actual implementation may vary depending on the structure of the OneDrive item's metadata and the design of the LlamaIndex framework. You should also handle potential errors and edge cases, such as the download URL not being available for certain items.
For more information on how to work with the LlamaIndex framework, you can refer to the LlamaIndex documentation.
Please let me know if you need further assistance.
Question Validation
Question
I want to query an index made from OneDrive files and also return a link to the file in the response.
I am looking to use CitationQueryEngine which also returns the source's metadata. Here i can get filename and id but not the link to the onedrive file.
I read the code for dataloader and there is a property in the onedrive item's metadata which links to the file.
https://github.com/run-llama/llama_index/blob/9607a05a923ddf07deee86a56d386b42943ce381/llama-index-integrations/readers/llama-index-readers-microsoft-onedrive/llama_index/readers/microsoft_onedrive/base.py#L237
But this property is not extracted when i load the documents for indexing.
https://github.com/run-llama/llama_index/blob/9607a05a923ddf07deee86a56d386b42943ce381/llama-index-integrations/readers/llama-index-readers-microsoft-onedrive/llama_index/readers/microsoft_onedrive/base.py#L263
How do i obtain access to the downloadurl in metadata when i use a citationquery on my index generated from onedrive files? Do i need to modify source code of this library or is there better way?