openai / sparse_autoencoder

MIT License
337 stars 35 forks source link

question about normalized MSE #8

Closed lukaemon closed 4 months ago

lukaemon commented 5 months ago

In paper 2.1:

We report a normalized version of all MSE numbers, where we divide by a baseline reconstruction error of always predicting the mean activations.

In readme example:

normalized_mse = (reconstructed_activations - input_tensor).pow(2).sum(dim=1) / (input_tensor).pow(2).sum(dim=1)

Which is the same as in loss.py:

def normalized_mean_squared_error(
    reconstruction: torch.Tensor,
    original_input: torch.Tensor,
) -> torch.Tensor:
    """
    :param reconstruction: output of Autoencoder.decode (shape: [batch, n_inputs])
    :param original_input: input of Autoencoder.encode (shape: [batch, n_inputs])
    :return: normalized mean squared error (shape: [1])
    """
    return (
        ((reconstruction - original_input) ** 2).mean(dim=1) / (original_input**2).mean(dim=1)
    ).mean()

The way I understand normalized MSE and divide by baseline reconstruction error of always predicting the mean activations is

mean_activations = input_tensor.mean(dim=1)
baseline_mse = (input_tensor - mean_activations).pow(2).mean()
actual_mse = (reconstructed_activations - input_tensor).pow(2).mean()
normalized_mse = actual_mse / baseline_mse

What did I miss? Did I misunderstand the paper or code? Thx for your time.

TomDLT commented 4 months ago

Thanks for reporting this discrepancy.

The version used in the paper is:

# computed only once before training on a fixed set of activations
mean_activations = original_input.mean(dim=0)  # averaging over the batch dimension
baseline_mse = (original_input - mean_activations).pow(2).mean()

# computed on each batch during training and testing
actual_mse = (reconstruction - original_input).pow(2).mean()
normalized_mse = actual_mse / baseline_mse
lukaemon commented 4 months ago

Got it. It matches the code in train.py Thanks for clarification.