Closed lukaemon closed 4 months ago
Thanks for reporting this discrepancy.
The version used in the paper is:
# computed only once before training on a fixed set of activations
mean_activations = original_input.mean(dim=0) # averaging over the batch dimension
baseline_mse = (original_input - mean_activations).pow(2).mean()
# computed on each batch during training and testing
actual_mse = (reconstruction - original_input).pow(2).mean()
normalized_mse = actual_mse / baseline_mse
Got it. It matches the code in train.py
Thanks for clarification.
In paper 2.1:
In readme example:
Which is the same as in loss.py:
The way I understand normalized MSE and divide by
baseline reconstruction error of always predicting the mean activations
isWhat did I miss? Did I misunderstand the paper or code? Thx for your time.