Closed Kshitij09 closed 4 years ago
1 and 3 are very similar same, I trained with augmix (jsd + ce w/ augmix augmentation) and then I ran again with (jsd + ce + randaugment), before that just CE + random erasing, cosine decay at 78.47 and before that torchvision. All of the relevant papers are linked in the README.
Thank you for quick reply :slightly_smiling_face: I really appreciate your work here and timm
is my go-to repository for any task.
@rwightman how should I cite your work?
@Kshitij09 thanks for asking, something simple is fine, I've seen a few citations to this and one of my other repos and they go something like...
Ross Wightman. PyTorch Image Models, 2020. https://github.com/rwightman/pytorch-image-models
Thanks!
Btw there is one other model, that never made the front page because it wasn't as good as the JSD, there are weights for a longer epoch RA run.
In total I have,
The hparams for the best JSD run is include in the README, the 240 epoch 78.8 RandAugment is basically the same as the ResNeXT w/ RandAugment for hparams
Thank you for the update! I'll try them out. I had tried to use JSD on my dataset but it didn't work that well, probably due to the size of dataset. I remember using 79% model for my experiments (around 2 months ago) and have saved the weights once I got the best result. But for some reason, I'm not able to reach that mark with the current resnet50 weights in the library. Is there any change that haven't been documented?
@Kshitij09 I have also found JSD to be challenging to apply in other situations, even with different models / training schedules or optimizers. JSD + EfficientNet w/ RMSProp, was awful. Heh.
I think part of the problem is finding more optimal hparams for JSD with other training params or datasets, but it's time consuming to search given how much time it adds to training. If you do have any successes with JSD in the future please share.
The weights have definitely not changed since they were posted, and I'm pretty sure the models haven't changed in any significant way that should impact training. Some new features have been added to the ResNet base model (stochastic depth (drop_path), drop block, different attn options, etc) but always default to off and only activated by additional cmd args or new model definitions.
One thing that's caused regressions in the past is zero_init_last_bn ... it can have problems with some training systems like distributed w/ sync-bn activated (but not without sync-bn)
I'm working on a medical imaging dataset and have previously tried the
torchvision
weights of resnet50 for fine-tuning. Changing those weights with the one provided in this library (79.038) has shown significant improvements in my dataset. I'd like to know what're all the tweaks being incorporated to bring a 77% accurate torchvision model (resnet50) to 79.038?Looking at the history of README, I found:
Could you please educate me about these techniques and the ones I'm missing (LR schedulers, optimizers,etc.)