Hello @annahedstroem,
It is possible that I am confusing something or calling the function incorrectly but if not, then I think there might be a mistake in how Sensitivity-n is calculated.
Description
In custom_postprocess, the similarity_func (pearson correlation) is calculated for each n, between the different samples in the batch rather than for every sample, between all n. In the readme and the original paper, this does seem to be the other way around if I am not mistaken.
For example, when a batch size of 1 is used, the pearson coefficient cannot be calculated anymore and it will fail.
Steps to reproduce the behavior (batch size 1 case)
File "...\quantus\metrics\faithfulness\sensitivity_n.py", line 408, in custom_postprocess
self.evaluation_scores = [
^
File "...\quantus\metrics\faithfulness\sensitivity_n.py", line 409, in <listcomp>
self.similarity_func(
File "...\quantus\functions\similarity_func.py", line 56, in correlation_pearson
return scipy.stats.pearsonr(a, b)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "...\scipy\stats\_stats_py.py", line 4816, in pearsonr
raise ValueError('`x` and `y` must have length at least 2.')
ValueError: `x` and `y` must have length at least 2.
Expected: a single output representing the Sensitivity-N score of a single sample
Hello @annahedstroem, It is possible that I am confusing something or calling the function incorrectly but if not, then I think there might be a mistake in how Sensitivity-n is calculated.
Description
In custom_postprocess, the similarity_func (pearson correlation) is calculated for each n, between the different samples in the batch rather than for every sample, between all n. In the readme and the original paper, this does seem to be the other way around if I am not mistaken.
For example, when a batch size of 1 is used, the pearson coefficient cannot be calculated anymore and it will fail.
Steps to reproduce the behavior (batch size 1 case)
Output:
Expected: a single output representing the Sensitivity-N score of a single sample