Extract useful system-generated metadata from Azure BLOB and user-defined metadata and tags.
The changes from this PR extract metadata from Azure BLOB entities. Before this PR the llama_index.readers.file.base.default_file_metadata_func was used for metadata extraction. However, since the data is downloaded to the host system, the extracted metadata may not be correct.
This PR implements metadata extraction directly from Azure BLOB properties which consists of system metadata (e.g. creation_time) and user-defined metadata (as metadata and tags).
The new metadata set equals to the one obtained by the default metadata extractor + other Azure system meta + user-defined meta.
No dependencies have been changed.
Type of Change
Please delete options that are not relevant.
[x] Bug fix / Smaller change
How Has This Been Tested?
This has been tested with Azure Storage Blob (authenticated with connection string) and a Python script.
Example:
Description
Extract useful system-generated metadata from Azure BLOB and user-defined metadata and tags.
The changes from this PR extract metadata from Azure BLOB entities. Before this PR the
llama_index.readers.file.base.default_file_metadata_func
was used for metadata extraction. However, since the data is downloaded to the host system, the extracted metadata may not be correct.This PR implements metadata extraction directly from Azure BLOB properties which consists of system metadata (e.g.
creation_time
) and user-defined metadata (asmetadata
andtags
).The new metadata set equals to the one obtained by the default metadata extractor + other Azure system meta + user-defined meta.
No dependencies have been changed.
Type of Change
Please delete options that are not relevant.
How Has This Been Tested?
This has been tested with Azure Storage Blob (authenticated with connection string) and a Python script. Example:
Suggested Checklist:
make format; make lint
to appease the lint gods