Closed rodrigobdz closed 2 years ago
Thank you @rodrigobdz for highlighting this
Please see this @Wickstrom as is related to https://github.com/understandable-machine-intelligence-lab/Quantus/pull/105 and leave your comments as you see fit!
I can correct it. In the grand scheme of things, I think AUC or AOC will tell us the same thing, but for one higher is better while for the other lower is better. However, since they use AOC in the paper we should stick to that.
For the calculation itself I think:
self.last_results.append(1-get_auc_score(preds, np.arange(0, len(preds))))
might not be correct, since the auc will not be bounded between 0 and 1. Rather, it depends on the length of preds. I think:
self.last_results.append(len(preds)-get_auc_score(preds, np.arange(0, len(preds))))
works. For instance, consider this example:
import numpy as np
preds = [0.9, 0.7, 0.6, 0.5, 0.2]
print(1-np.trapz(preds)) # negative score
print(len(preds)-np.trapz(preds)) # positive score
Amongst other smaller bug fixes, I'm including this bug fix in this PR https://github.com/understandable-machine-intelligence-lab/Quantus/pull/114 which deals with several issues like these.
@Wickstrom can you confirm my understanding that the AOC calculation would be the following:
self.last_results.append(len(preds)-np.trapz(preds,dx=np.arange(0, len(preds))))
Danke schön!
Yes, I think that should be correct.
@annahedstroem Independently of the AOC calculation, the function numpy.trapz
accepts only float for its dx
argument. Besides, the argument x=np.arange(0, len(preds))
can be omitted because it implies dx=1.0
which is the default for that function already.
Assuming the calculation above is indeed correct, the right function call should be:
self.last_results.append(len(preds)-np.trapz(preds))
@Wickstrom I'm not sure numpy.trapz
is the right tool to calculate the AUC score bounded between 0 and 1. In the example provided, the resulting AUC score is outside the [0,1] bounds:
from typing import List
import numpy
preds: List[float] = [0.9, 0.7, 0.6, 0.5, 0.2]
# AUC
print(numpy.trapz(preds))
# Output: 2.35
# AOC
print(1-numpy.trapz(preds)) # negative score
# Output: -1.35
# AOC
print(len(preds)-numpy.trapz(preds)) # positive score
# Output: 2.65
There is a function from sklearn
to compute the AUC but, unfortunately, it yields the same result as numpy.trapz
—i.e., result is not bounded:
import sklearn.metrics
x: numpy.ndarray = numpy.arange(0, len(preds))
print(sklearn.metrics.auc(x, preds))
# Output: 2.35
An alternative would be to implement AOPC defined in equation 12 in [1] and take its complement to get AUC:
[1] Samek, Wojciech, Alexander Binder, Grégoire Montavon, Sebastian Lapuschkin, and Klaus-Robert Müller. "Evaluating the visualization of what a deep neural network has learned." IEEE transactions on neural networks and learning systems 28, no. 11 (2016): 2660-2673.
@rodrigobdz AUC is not bounded between 0 and 1, so it is not a problem that we get a score that is larger than 1. My concern was that this formulation:
self.last_results.append(1-get_auc_score(preds, np.arange(0, len(preds))))
assumes it is, so we just needed to modify it a bit. The sklearn.metrics.auc function also uses np.trapz to calculate the integral, which is why they produce the same result. Essentially, we want to compute and integral, also in the Samek et.al. paper. I think using a fast and simple numpy function to compute this integral is the best way to go, but I'm also open for other suggestions.
@Wickstrom I fully agree with you on using a library to compute it for efficiency reasons. I had a misconception that the AUC score had to be in the range [0,1].
Final question on this topic, is the AOPC score bounded or not—similar to AUC?
I will leave this issue open as it will automatically be closed by https://github.com/understandable-machine-intelligence-lab/Quantus/pull/114.
The AOPC is not bounded, just as the AUC. You can see this for instance in Figure 4 of Samek et.al. that you linked above.
Thanks a lot both
issue is now fixed: https://github.com/understandable-machine-intelligence-lab/Quantus/pull/114#pullrequestreview-943850575.
See line 708 in faithfulness.py
.
The metric
IterativeRemovalOfFeatures
should compute AOC but is computing AUC instead—see snippets below as proof.Bug Description
The value being computed and appended to the list
last_results
is AUC—seeget_auc_score
definition. It seems like the line of code commented out contains the correct AOC computation:AUC Definition
Typo aside, being fixed in #112, the docstring should read area under the curve ~(AOC)~ (AUC).
https://github.com/understandable-machine-intelligence-lab/Quantus/blob/3a2f72cc99c3353bf60a36fcf2a3dc0eaa2fbfa3/quantus/metrics/faithfulness_metrics.py#L1434-L1436
1. AOC computing AUC instead
https://github.com/understandable-machine-intelligence-lab/Quantus/blob/3a2f72cc99c3353bf60a36fcf2a3dc0eaa2fbfa3/quantus/metrics/faithfulness_metrics.py#L712-L713
2. AUC being appended to
all_results
https://github.com/understandable-machine-intelligence-lab/Quantus/blob/3a2f72cc99c3353bf60a36fcf2a3dc0eaa2fbfa3/quantus/metrics/faithfulness_metrics.py#L721
3. Final aggregated score contains AUC scores instead of AOC
https://github.com/understandable-machine-intelligence-lab/Quantus/blob/3a2f72cc99c3353bf60a36fcf2a3dc0eaa2fbfa3/quantus/metrics/faithfulness_metrics.py#L726-L728