lightly-ai / lightly

A python library for self-supervised learning on images.
https://docs.lightly.ai/self-supervised-learning/
MIT License
2.92k stars 248 forks source link

MoCo test #374

Closed marcomameli1992 closed 3 years ago

marcomameli1992 commented 3 years ago

Dear, I'm trying to use the MoCo examples that is presented in this link: https://docs.lightly.ai/tutorials/package/tutorial_moco_memory_bank.html

but at the validation stage I receive this output: ValueError: The preds should be probabilities, but values were detected outside of [0,1] range.

With debug I try that the output of the fc in the classification model (in the forward method) is a tensor that contains negatives value but I do not understand why. Can anyone help to understand this problem and how to solve it.

philippmwirth commented 3 years ago

Hi @marcomameli1992, it looks to me like the specs of the pytorch-lightning accuracy calculation have changed.

Can you please tell me the pytorch and pytorch-lightning version you are using?

For a quick fix you could try to add the following line to the validation_step, right after the forward pass

y_hat = nn.functional.softmax(y_hat)

and see if it works?

marcomameli1992 commented 3 years ago

Dear I'm using these package version: pytorch-lightning==1.2.10 torch==1.8.1

I'm trying your solution and it works. Thank you so much.

Now I would like to understand the code output so I'm asking you if you can exaplain it to me if I share it with you. thank you so much.

philippmwirth commented 3 years ago

Yes, you can share it here and I'll take a look at it.

philippmwirth commented 3 years ago

Made an issue about updating the tutorial here: https://github.com/lightly-ai/lightly/issues/375.

marcomameli1992 commented 3 years ago

Here I paste my output:

GPU available: True, used: True TPU available: False, using: 0 TPU cores LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] Epoch 0: 0%| | 0/1 [00:00<?, ?it/s] | Name | Type | Params

0 | resnet_moco | MoCo | 23.0 M 1 | criterion | NTXentLoss | 0

11.5 M Trainable params 11.5 M Non-trainable params 23.0 M Total params 91.977 Total estimated model params size (MB) Epoch 2: 100%|██████████| 1/1 [00:31<00:00, 31.44s/it, loss=4.56, v_num=16] GPU available: True, used: True TPU available: False, using: 0 TPU cores LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] Validation sanity check: 0%| | 0/2 [00:00<?, ?it/s] | Name | Type | Params

0 | resnet_moco | MoCo | 23.0 M 1 | fc | Linear | 5.1 K 2 | accuracy | Accuracy | 0

5.1 K Trainable params 23.0 M Non-trainable params 23.0 M Total params 91.998 Total estimated model params size (MB) /run/media/marcomameli01/MarcoWorkDirectory/DeepLearning/Fashion/SelfLearning/Light/self-moco.py:160: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument. y_hat = nn.functional.softmax(y_hat) Epoch 0: 100%|██████████| 3/3 [00:28<00:00, 9.47s/it] Validating: 0it [00:00, ?it/s] Validating: 0%| | 0/2 [00:00<?, ?it/s] Epoch 0: 100%|██████████| 3/3 [00:52<00:00, 17.64s/it, loss=2.3, v_num=17, val_acc=0.426] Epoch 1: 100%|██████████| 3/3 [00:29<00:00, 9.84s/it, loss=2.3, v_num=17, val_acc=0.426] Validating: 0it [00:00, ?it/s] Validating: 0%| | 0/2 [00:00<?, ?it/s] Epoch 1: 100%|██████████| 3/3 [00:53<00:00, 17.93s/it, loss=1.9, v_num=17, val_acc=0.636] Epoch 2: 100%|██████████| 3/3 [00:28<00:00, 9.55s/it, loss=1.9, v_num=17, val_acc=0.636] Validating: 0it [00:00, ?it/s] Validating: 0%| | 0/2 [00:00<?, ?it/s] Epoch 2: 100%|██████████| 3/3 [00:54<00:00, 18.09s/it, loss=1.75, v_num=17, val_acc=0.733] Epoch 2: 100%|██████████| 3/3 [00:54<00:00, 18.22s/it, loss=1.75, v_num=17, val_acc=0.733]

Process finished with exit code 0

So I understand the train execution but the v_num parameter not and the warning about the softmax.

philippmwirth commented 3 years ago

The v_num comes from pytorch-lightning and it's the experiment version number.

The softmax warning happens because you don't specify along which dimension the softmax should be calculated. Implicitly, torch will use the last dimension but the warning indicates that this has been deprecated. You can simply change your softmax call to

y_hat = nn.functional.softmax(y_hat, dim=-1)

and the warning should go away.