ResNet 18 implementation differs from other example difficulty papers

pratyushmaini / ssft

[NeurIPS'22] Official Repository for Characterizing Datapoints via Second-Split Forgetting

12 stars 1 forks source link

The ResNet 18 code for this repo differs from the other example difficulty paper such as EL2N ResNet 18 or forgetting ResNet 18.

This repo used ResNet 18 from torchvision, which has 7x7 conv and maxpool in the stem.

Since this setting is not suitable for low-resolution image dataset such as CIFAR-10, changing the stem is common for measuring the generalization performance of CIFAR-10 and CIFAR-100.

From the Appendix C, I found that some of the results were driven by ResNet 18 model.

I think the mismatch between the ResNet 18 model used for the experiments and real may have some potential issues, such as different results and also the analysis. (I also checked the ResNet 50, but I cannot find the custom_resnet python file)

References

NeurIPS 2021, Deep Learning on a Data Diet: Finding Important Examples Early in Training

ICLR 2019, An Empirical Study of Example Forgetting during Deep Neural Network Learning

Hi Jason! These are fantastic observations. A few points to note:

A lot of experiments are performed on smaller backbone, like ResNet-9; and conclusions are consistent across models.
We do make a custom ResNet50 because of the same reason. In that architecture, the kernel size is even less compatible with the image size of CIFAR-10.
I do not think the kernel size of ResNet-18 is a big problem for CIFAR-10 experiments, the generalization is typically good. But I do agree that it would be worthwhile to also use a Custom ResNet18 with kernel size=3 like we did for the larger model.

I have added the custom_resnet file in the code. Sorry for the oversight. It should be straightforward to swap the model to CustomResNet18 as well

pratyushmaini / ssft

ResNet 18 implementation differs from other example difficulty papers #2