huggingface / pytorch-image-models

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
https://huggingface.co/docs/timm
Apache License 2.0
32.16k stars 4.75k forks source link

[question] Changes in the original Resnet training so far? #192

Closed Kshitij09 closed 4 years ago

Kshitij09 commented 4 years ago

I'm working on a medical imaging dataset and have previously tried the torchvision weights of resnet50 for fine-tuning. Changing those weights with the one provided in this library (79.038) has shown significant improvements in my dataset. I'd like to know what're all the tweaks being incorporated to bring a 77% accurate torchvision model (resnet50) to 79.038?

Looking at the history of README, I found:

  1. JSD+CE loss
  2. RandAugment
  3. AugMix

Could you please educate me about these techniques and the ones I'm missing (LR schedulers, optimizers,etc.)

rwightman commented 4 years ago

1 and 3 are very similar same, I trained with augmix (jsd + ce w/ augmix augmentation) and then I ran again with (jsd + ce + randaugment), before that just CE + random erasing, cosine decay at 78.47 and before that torchvision. All of the relevant papers are linked in the README.

Kshitij09 commented 4 years ago

Thank you for quick reply :slightly_smiling_face: I really appreciate your work here and timm is my go-to repository for any task.

Kshitij09 commented 4 years ago

@rwightman how should I cite your work?

rwightman commented 4 years ago

@Kshitij09 thanks for asking, something simple is fine, I've seen a few citations to this and one of my other repos and they go something like...

Ross Wightman. PyTorch Image Models, 2020. https://github.com/rwightman/pytorch-image-models

Kshitij09 commented 4 years ago

Thanks!

rwightman commented 4 years ago

Btw there is one other model, that never made the front page because it wasn't as good as the JSD, there are weights for a longer epoch RA run.

In total I have,

The hparams for the best JSD run is include in the README, the 240 epoch 78.8 RandAugment is basically the same as the ResNeXT w/ RandAugment for hparams

Kshitij09 commented 4 years ago

Thank you for the update! I'll try them out. I had tried to use JSD on my dataset but it didn't work that well, probably due to the size of dataset. I remember using 79% model for my experiments (around 2 months ago) and have saved the weights once I got the best result. But for some reason, I'm not able to reach that mark with the current resnet50 weights in the library. Is there any change that haven't been documented?

rwightman commented 4 years ago

@Kshitij09 I have also found JSD to be challenging to apply in other situations, even with different models / training schedules or optimizers. JSD + EfficientNet w/ RMSProp, was awful. Heh.

I think part of the problem is finding more optimal hparams for JSD with other training params or datasets, but it's time consuming to search given how much time it adds to training. If you do have any successes with JSD in the future please share.

The weights have definitely not changed since they were posted, and I'm pretty sure the models haven't changed in any significant way that should impact training. Some new features have been added to the ResNet base model (stochastic depth (drop_path), drop block, different attn options, etc) but always default to off and only activated by additional cmd args or new model definitions.

One thing that's caused regressions in the past is zero_init_last_bn ... it can have problems with some training systems like distributed w/ sync-bn activated (but not without sync-bn)