microsoft / Blob-Inventory-Report-Analytics

MIT License
6 stars 6 forks source link

Can't uninstall azure-storage-blob #6

Open soerenfw opened 1 year ago

soerenfw commented 1 year ago

Notebook does not uninstall azure-storage-blob: image

Code-Link: https://github.com/microsoft/Blob-Inventory-Report-Analytics/blob/main/src/ReportAnalysis.ipynb?short_path=2e661d9

brentwolfram commented 1 year ago

I used this as a workaround:

!pip uninstall azure-storage-blob --yes !pip install azure-storage-blob==2.1.0

soerenfw commented 1 year ago

Thanks, that worked. But, using an old version is maybe not the best idea (current version is 12.15.0). I digged into the functions and the sultion I have found is to use the following code:

from azure.storage.blob import BlobServiceClient
json_file_data = get_configuration_file_data(storage_account, container_name, file_name)
spark.conf.set(f"fs.azure.sas.{json_file_data["destinationContainer"]}.{json_file_data["storageAccountName"]}.blob.core.windows.net", blob_sas_token)

mssparkutils.fs.mount( 
    f"wasbs://{json_file_data["destinationContainer"]}@{json_file_data["storageAccountName"]}.blob.core.windows.net", 
    "/inv"
    ,{"linkedService":"mytestaccount"}
)

In order to mount the storage, one has to create a linked service to the storage account.

Then adjust the function get_json_link_of_reports to the following:

# returns all the json file links to the respective inventory reports (by providing the destination container and the rule name along with storage account credentials)
def get_json_link_of_reports(storage_account_name, access_key, destination_container, rule_name):
    blob_service = BlobServiceClient(f"https://{storage_account_name}.blob.core.windows.net/", access_key)
    # list the relative path to all the blobs present in the destination container
    try:
        blob_list = blob_service.get_container_client(destination_container).list_blobs()
    except Exception as e:
        print("Error: Container does not exist", e)
        return
    # storing the links to all the blob inventory reports
    links_list = []
    # iterating over the returned list of relative path to blobs
    for blob in blob_list:
        # checking if the relative path contains 'ruleName-manifest.json' and correspondingly creating a link to that json file
        if rule_name+"-manifest.json" in blob.name:
            job_id = mssparkutils.env.getJobId()            
            link = f"synfs:/{job_id}/inv/{blob.name}"
            links_list.append(link)
    return links_list

Instead of using BlockBlobService it uses the new BlobServiceClient.