vturrisi / solo-learn

solo-learn: a library of self-supervised methods for visual representation learning powered by Pytorch Lightning
MIT License
1.43k stars 186 forks source link

Questions about some parameters in data augmentation #179

Closed juliendenize closed 3 years ago

juliendenize commented 3 years ago

Hello, thanks for sharing your library.

I was trying to toy with DALI and I didn't succeed to retrieve the results as in torchvision when I found your very well structured code. When I checked your code I noticed you specified in GaussianBlur a default window size of 23 that is not present in torchvision. Could you give me some clues on how you computed this particular window ?

Also in several examples provided by DALI they use Triangular interpolation for the cropping, torchvision uses Bilinear and you Cubic, is there a particular reason to use this one over Triangular ?

vturrisi commented 3 years ago

Hey, About the interpolation, we followed Barlow Twins (https://github.com/facebookresearch/barlowtwins/blob/main/main.py) and BYOL do, which is using cubic. You can get different results depending on the interpolation that you use, but it's hard to judge. I would say that better interpolation methods produce better models.

For the window of the Gaussian, this comes from the SimCLR paper (I can try to look for the code later). For the torchvision counterpart (which is in this case only PIL) we kept the same as in Barlow's official code. Pillow is sometimes a nightmare and it's hard to find parameters, but I would say it's comparable.

About reproducing the same results with DALI and torchvision, this is pretty much impossible, even when it comes to loading data, as they are don't produce bit-wise perfect results. In our experience, DALI produces better values 80% of the time, or even more, but they are always within some margin of ~1% at max.

juliendenize commented 3 years ago

Alright, I'm convinced by your argument to use the Cubic interpolation :), I guess the counterpart would be a little slower interpolation but probably negligible.

Don't worry you don't have to go into their code, I checked and indeed you're right it comes from this line of code the window size is computed based on the height of the image (height // 10 so 23 for imagenet).

Thank you for your feedback, it's interesting that you have better results almost every time, this tool is definitely a must-have.

I'll close the issue, thank you very much for taking the time to give this extensive answer.