Open ahmed-bensaad opened 3 years ago
@ahmed-bensaad should be fixed in 0.1.0! https://github.com/lucidrains/pixel-level-contrastive-learning/commit/eda34f78f5c30e4bd146c10a8cb5af79428f901f
@ahmed-bensaad you should also have sync batchnorm turned on, as brought to light by this paper https://arxiv.org/abs/2101.07525
@lucidrains Thank you for your response. Indeed his error has been fixed but now I have a new one
0: RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`; (2) making sure all `forward` function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).
for (1) : According to this issue, It is not yet possible to set find_unused_parameters=False when using pytorch lightning for (2) : I think this can be dealt with in the forward function of the model :
def forward(self, x):
shape, device, prob_flip = x.shape, x.device, self.prob_rand_hflip
rand_flip_fn = lambda t: torch.flip(t, dims = (-1,))
. . .
return loss, positive_pixel_pairs
@ahmed-bensaad you should also have sync batchnorm turned on, as brought to light by this paper https://arxiv.org/abs/2101.07525
sync batchnorm doesn't seem to work with ddp2 accelerator
0: ValueError: SyncBatchNorm is only supported for DDP with single GPU per process
Sure! I can add that! Why did you close the PR lol
I closed the issue because the workaround proposed here works fine for me. It is mainly a pytorch lightning issue not related to this repository
@ahmed-bensaad ahh, thanks for that, but I want to make this work for everyone lol
https://github.com/lucidrains/pixel-level-contrastive-learning/releases/tag/0.1.1 should be good now!
@ahmed-bensaad would you be willing to share your script with a pull request after it works? :D
Of course I will. With another (very) minor change to the package.
did you ever get this to work?
Hello,
I think it is working on my side. But I just need to finish some training epochs in order to be sure that it works 100%.
I will push my code example shortly after
----- Mail original ----- De: "Phil Wang" notifications@github.com À: "lucidrains/pixel-level-contrastive-learning" pixel-level-contrastive-learning@noreply.github.com Cc: "Ahmed Ben Saad" ahmed.bensaad@telecom-paristech.fr, "Mention" mention@noreply.github.com Envoyé: Vendredi 5 Février 2021 07:26:54 Objet: Re: [lucidrains/pixel-level-contrastive-learning] Using pytorch-lightning to train PixelCL on multi-gpu (#11)
did you ever get this to work?
-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/lucidrains/pixel-level-contrastive-learning/issues/11#issuecomment-773822649
woohoo!
Hello everyone,
I'm trying to use pytorch-lightning to train PixelCL on 2 gpus using ddp2 accelerator.
I followed this example :
When I try to run this I got the following error :
According to this issue Something needs to be done to register the forward hook. But I cannot understand what is it. Could someone help me please?
Thanks