nttcslab / byol-a

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation
https://arxiv.org/abs/2103.06695
Other
204 stars 35 forks source link

Doubt in RunningNorm #8

Closed Sreyan88 closed 2 years ago

Sreyan88 commented 2 years ago

Hi There, great repo!

I think I have misunderstood something wrong with the RunningNorm function. The function expects the size of an epoch, however, your implementation passes the size of the entire dataset.

Is it a bug? Or is there a problem with my understanding?

Thank You!

daisukelab commented 2 years ago

Hi @Sreyan88, thank you for your interest.

Regarding your question about the RunningNorm (it's a class), let me explain the usage. Your understanding of the expectation of the size of an epoch is correct. And it is supposed to be fed as the option max_update_epochs. One another thing I need to convey is epoch_samples, the first option. This option expects the number of samples RunningNorm will handle in one epoch. The following code is feeding the number of samples as 2 * len(files) because we augment twice for one training file.

tfms = AugmentationModule((64, 96), 2 * len(files))

I hope this answers your question.

class RunningNorm(nn.Module):
    """Online Normalization using Running Mean/Std.

    This module will only update the statistics up to the specified number of epochs.
    After the `max_update_epochs`, this will normalize with the last updated statistics.

    Args:
        epoch_samples: Number of samples in one epoch
        max_update_epochs: Number of epochs to allow update of running mean/variance.
        axis: Axis setting used to calculate mean/variance.
    """
Sreyan88 commented 2 years ago

Thank You so much for the explaination!