understandable-machine-intelligence-lab / Quantus

Quantus is an eXplainable AI toolkit for responsible evaluation of neural network explanations
https://quantus.readthedocs.io/
Other
546 stars 76 forks source link

Sensitivity-n seems to average over features and correlate samples rather than correlate features and average over samples #354

Open arnon-1 opened 1 month ago

arnon-1 commented 1 month ago

Hello @annahedstroem, It is possible that I am confusing something or calling the function incorrectly but if not, then I think there might be a mistake in how Sensitivity-n is calculated.

Description

In custom_postprocess, the similarity_func (pearson correlation) is calculated for each n, between the different samples in the batch rather than for every sample, between all n. In the readme and the original paper, this does seem to be the other way around if I am not mistaken.

For example, when a batch size of 1 is used, the pearson coefficient cannot be calculated anymore and it will fail.

Steps to reproduce the behavior (batch size 1 case)

import numpy as np
from quantus.helpers.model.models import LeNet
import quantus

quantus.SensitivityN()(
            model=LeNet(),
            x_batch=np.zeros((1, 1, 28, 28)),
            y_batch=np.zeros((1,), dtype=np.int64),
            device="cuda",
            explain_func=quantus.explain,
            explain_func_kwargs={"method": "Saliency"}
        )

Output:

File "...\quantus\metrics\faithfulness\sensitivity_n.py", line 408, in custom_postprocess
    self.evaluation_scores = [
                             ^
  File "...\quantus\metrics\faithfulness\sensitivity_n.py", line 409, in <listcomp>
    self.similarity_func(
  File "...\quantus\functions\similarity_func.py", line 56, in correlation_pearson
    return scipy.stats.pearsonr(a, b)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\scipy\stats\_stats_py.py", line 4816, in pearsonr
    raise ValueError('`x` and `y` must have length at least 2.')
ValueError: `x` and `y` must have length at least 2.

Expected: a single output representing the Sensitivity-N score of a single sample

annahedstroem commented 2 weeks ago

Hi @arnon-1 thanks for notifying this. We'll look into this and get back to you!