Closed davidebaltieri31 closed 2 years ago
Since PrecisionRecallCurve works on all data/is computed on the complete data by default in torchmetrics, you will have to hoid all your datapoints in memory. For large datasets, you can look at the binned PR-Curve, implemented for constant-memory requirement by torchmetrics.
I think my post wasn't clear and I used the term "step" incorrectly. I understand that PrecisionRecallCurve works on all data, but why it keep data from previous epochs? I.e. I run training and validation after each epoch: Epoch 1 -> validation 1 -> epoch 2 -> validation 2 -> etc. which are I think the default settings in the repo config files, my question is why does AdaptiveThreshold uses PrecisionRecallCurve in a way that keep data both from validation 1 and 2? shouldn't it use all data from validation 2 only (which is validation on the latest trained model only)?
I think my post wasn't clear and I used the term "step" incorrectly. I understand that PrecisionRecallCurve works on all data, but why it keep data from previous epochs? I.e. I run training and validation after each epoch: Epoch 1 -> validation 1 -> epoch 2 -> validation 2 -> etc. which are I think the default settings in the repo config files, my question is why does AdaptiveThreshold uses PrecisionRecallCurve in a way that keep data both from validation 1 and 2? shouldn't it use all data from validation 2 only (which is validation on the latest trained model only)?
Ahh I see. At a quick glance it seems as if the reset
method is missing from AdaptiveThreshold.
Can you try to add:
def reset(self):
self.precision_recall_curve.reset()
to the class and see if that fixes the bug/problem? Class can be found here. overall class should look as follows:
class AdaptiveThreshold(Metric):
"""Optimal F1 Metric.
Compute the optimal F1 score at the adaptive threshold, based on the F1 metric of the true labels and the
predicted anomaly scores.
"""
def __init__(self, default_value: float = 0.5, **kwargs):
super().__init__(**kwargs)
self.precision_recall_curve = PrecisionRecallCurve(num_classes=1)
self.add_state("value", default=torch.tensor(default_value), persistent=True) # pylint: disable=not-callable
self.value = torch.tensor(default_value) # pylint: disable=not-callable
# pylint: disable=arguments-differ
def update(self, preds: torch.Tensor, target: torch.Tensor) -> None: # type: ignore
"""Update the precision-recall curve metric."""
self.precision_recall_curve.update(preds, target)
def compute(self) -> torch.Tensor:
"""Compute the threshold that yields the optimal F1 score.
Compute the F1 scores while varying the threshold. Store the optimal
threshold as attribute and return the maximum value of the F1 score.
Returns:
Value of the F1 score at the optimal threshold.
"""
precision: torch.Tensor
recall: torch.Tensor
thresholds: torch.Tensor
precision, recall, thresholds = self.precision_recall_curve.compute()
f1_score = (2 * precision * recall) / (precision + recall + 1e-10)
if thresholds.dim() == 0:
# special case where recall is 1.0 even for the highest threshold.
# In this case 'thresholds' will be scalar.
self.value = thresholds
else:
self.value = thresholds[torch.argmax(f1_score)]
return self.value
# ADDED METHOD
def reset(self):
self.precision_recall_curve.reset()
Note: Same will be necessary for OptimalF1
also
Hi, added the reset code to both, but it's not called. I think there are probably other changes that need to be made, probably in anomaly_module or somewhere else in anomalib?
Hi, added the reset code to both, but it's not called. I think there are probably other changes that need to be made, probably in anomaly_module or somewhere else in anomalib?
Did you install anomalib with the -e
flag, i.e. via pip install -e
?
If so, please provide a MWE so that the bug can be localized.
Shouldn't this one be closed?
I believe that the original issue is not yet resolved. While the reset
method has been added to the AdaptiveThreshold
metric class, the method is never called. This means that for models that need to be trained for multiple epochs, the data from the previous epochs is never cleared, which in turn may lead to out of memory errors. It could also lead to inaccurate computation of the adaptive threshold. We will investigate this a bit further and work on a fix.
I'm running anomalib on a custom dataset (around 20k samples) and I've noticed costant crashes after a few epochs/validation steps (DefaultCPUAllocator: can't allocate memory...) almost independently of the modeI I choose.
It seem to be related to AdaptiveThreshold. In it PrecisionRecallCurve is used, and after every validation step new data is added to it, old data is never deleted. Why? Shouldn't data related to the previous validation steps be deleted?