lightly-ai / lightly

A python library for self-supervised learning on images.
https://docs.lightly.ai/self-supervised-learning/
MIT License
3.16k stars 283 forks source link

Add configurable normalization epsilon for NTXentLoss #1259

Open mieszkokl opened 1 year ago

mieszkokl commented 1 year ago

When training with half-precision I noticed that normalization in NTXentLoss can give NaN values.

in forward method, there is a code:

        # normalize the output to length 1
        out0 = nn.functional.normalize(out0, dim=1)
        out1 = nn.functional.normalize(out1, dim=1)

It uses torch.nn.functional.normalize function with default 1e-12 epsilon, what gives 0 for half precision. As a result we have division by zero and NaN in output.

The way to solve it is to add optional normalization epsilon parameter in NTXentLoss initializer and use it when calling torch.nn.functional.normalize function.

Please let me know if there is any mistake in my understanding. If it's okay for you, I can propose a pull request.

philippmwirth commented 1 year ago

You are right, 1e-12 is too small for torch.HalfTensor:

>>> torch.HalfTensor([1e-12])
tensor([0.], dtype=torch.float16)

However, the only way to run into this problem is if the tensors out0 and out1 have a norm smaller than 1e-12 which is a very unlikely scenario and could hint at a possible bug in your code. That being said I think your fix is reasonable and we'd welcome your pull request.