DRISE GPU optimizations for faster execution time during mask generation

Description

This PR optimizes the mask generation code execution time to reduce DRISE explanation runtime. This optimization is at the per-instance level of explanations. The main optimization is to generate the original mask directly on the GPU device if it is available instead of on CPU and then move it to GPU. Attempts were also made to try and optimize the saliency_fusion step where the mask affinity records are used to calculate the saliency maps, but here it is much more tricky as the GPU can easily run out of memory for large images and the code needs to be careful not to move the matrices back and forth between CPU and GPU devices as it seems there is a large cost to this in execution time.

The optimizations were measured on the fridge object detection notebook locally: https://github.com/microsoft/responsible-ai-toolbox/blob/main/notebooks/responsibleaidashboard/responsibleaidashboard-fridge-object-detection-model-debugging.ipynb

Note the images were resized in this notebook to be 10X larger with the code (in order to make this more comparable to a customer scenario with very large images):

    for i, file in enumerate(os.listdir("./data/odFridgeObjects/" + "images")):
        image_path = "./data/odFridgeObjects/" + "images" + "/" + file
        img = Image.open(image_path)
        basewidth = img.width * 10
        wpercent = (basewidth/float(img.size[0]))
        hsize = int((float(img.size[1])*float(wpercent)))
        img = img.resize((basewidth,hsize), Image.Resampling.LANCZOS)
        img.save(image_path)
        data = data.append({ImageColumns.IMAGE.value: image_path,
                            ImageColumns.LABEL.value: labels[i]}, # folder
                            ignore_index=True)
        if i > 3:
            break

Also, the notebook was modified to move the model to GPU and use a GPU device:

from ml_wrappers.common.constants import Device
device = Device.CUDA.value
model = model.cuda()

...

rai_insights = RAIVisionInsights(model, test_data.sample(2, random_state=42),
                                 "label",
                                 task_type=ModelTask.OBJECT_DETECTION,
                                 classes=class_names,
                                 num_masks=100,
                                 device=device)

rai_insights

Run with optimizations for PR during rai_insights.compute() step for two images:

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:46<00:00, 2.15it/s] 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 99/99 [01:49<00:00, 1.10s/it] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:43<00:00, 2.30it/s] 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 99/99 [01:47<00:00, 1.09s/it]

Error Analysis Current Status: Generating error analysis reports. Current Status: Finished generating error analysis reports. Time taken: 0.0 min 0.0055526000000440945 sec

As comparison, run without the optimizations:

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [01:21<00:00, 1.23it/s] 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 99/99 [01:57<00:00, 1.19s/it] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [01:22<00:00, 1.21it/s] 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 99/99 [01:30<00:00, 1.09it/s]

Error Analysis Current Status: Generating error analysis reports. Current Status: Finished generating error analysis reports. Time taken: 0.0 min 0.001796500000011747 sec

In the comparison, for the two images the mask generating and prediction step takes: image1: 46 seconds image2: 43 seconds Without optimizations it took about 2X longer: image1: 1 min 21 seconds image2: 1 min 22 seconds

The saliency_fusion step, which was not optimized, takes about the same time with and without optimizations: With optimizations: image1: 1 min 49 seconds image2: 1 min 47 seconds Without optimizations (there is no change, since this code was not GPU optimized due to OOM): image1: 1 min 57 seconds image2: 1 min 30 seconds

Checklist

[ ] I have added tests for all changes.
[x] Documentation was updated if it was needed.

microsoft / vision-explanation-methods

DRISE GPU optimizations for faster execution time during mask generation #45

Description

Checklist