lukemelas / EfficientNet-PyTorch

A PyTorch implementation of EfficientNet
Apache License 2.0
7.89k stars 1.53k forks source link

Finetune on EfficientNet looks like a disaster? #30

Open BowieHsu opened 5 years ago

BowieHsu commented 5 years ago

Hi, luke, Thank you for your solid work! We tried to replace the backbone of FPN from Resnet50 into EfficientNetB0. but the Focal loss is always large and looks like never converges. maybe the reason is drop_connect? or tf like padding conv? or something else? Can you show me some tips? many thanks!

PS:Does anyone else tried to train on object detection task with efficientnet, maybe we can also discuss here.

BowieHsu commented 5 years ago
屏幕快照 2019-06-19 下午2 31 39

Add drop connect makes the loss reasonable, but the code in utils should be: random_tensor += torch.rand([batch_size, 1, 1, 1], dtype=inputs.dtype, device='cuda')

lukemelas commented 5 years ago

Yes, the drop_connect cpu/gpu issue is now fixed in the repo (as of a minute ago). How is object detection looking after a day of training?

BowieHsu commented 5 years ago

@lukemelas the results show that after adding drop_connect, EfficientB3 FPN reached the same mAP as ResNet50 FPN. But I got another problem when trying to convert pth into onnx, it looks like we may rewrite the tf-like conv in pytorch style?

BowieHsu commented 5 years ago

faa84a681f236fb29529d31ca6af4030

bruceyang2012 commented 5 years ago

@BowieHsu How about the inference time of EfficientB3 FPN and ResNet50 FPN? I am curious about it.Thanks.

lukemelas commented 5 years ago

Good to see those results! I'll rewrite the tf-like conv to a pytorch conv in the next version.

FamousDirector commented 5 years ago

@BowieHsu I’ve also tried replacing ResNet backbone with EfficientNet within an FPN but get poor results. I will need to try the new changes. What layers are you choosing for the FPN? Did you have to modify the FPN design?

BowieHsu commented 5 years ago

@bruceyang2012 as figure shows, the inference time of efficientB0 FPN should be 0.087s for FP16, it's pure python control flow, , I will report TenorRT result when onnx convert func is ready.

bruceyang2012 commented 5 years ago

@BowieHsu How about the inference time of ResNet50 FPN in the same environment?

fmobrj commented 5 years ago

I tried EfficientNet B3 with stanford cars instead of REsnet50 and Densenet169 and could experience a 4 p.p. increase in test accuray from 89% to more than 93% accuracy. I was also struggling to improve, but could get better results using RMsprop instead of Adam. I also created a custom head with one more linear layer (2LLs) after the last conv, using a dropout with p = 0.2 before the last LL;

FamousDirector commented 5 years ago

@BowieHsu I tried again with EfficientNetB0 and the same RetinaNet repo you used. The mAP result would not go above 1%. How did you integrate the EfficentNet architecture with the FPN?

BowieHsu commented 5 years ago

@FamousDirector We rewrite the EfficientNet model load and init code, now the feature map index of backbone should be 4、10、15

BowieHsu commented 5 years ago

@FamousDirector Yep, with pretrain model. have you update efficientnet code with drop connection? We also meet the same problem at first, but We fix it as I said "Add drop connect makes the loss reasonable".

dawnsparrow commented 5 years ago

@BowieHsu is it necessary to use same resolution as paper to do detection or segmentation?

FamousDirector commented 5 years ago

@BowieHsu have you had any luck converting from ONNX to the .plan file?

RahulBhalley commented 5 years ago

@fmobrj I would like to ask you some questions regarding your training settings because my accuracy is not jumping above even 42%. 🙁 I used SGD with momentum of 0.999 and learning rate was set to 0.001. I tried a batch size of 32, 64, and then finally 256 (as suggested in paper: Do Better ImageNet Models Transfer Better?) and it did give around 42% whilst smaller batches stuck at around 30%+ only.

Questions

Q1. What batch size, optimizer, learning rate, and other hyper-parameters are you using? Q2. Are you using fixed-feature extractor and only updating penultimate layer or fine-tuning the network? Q3. On the basis of Q2, if you are fine-tuning EfficientNet then how many last layers are you updating the parameters of?

P.S.: I am stuck badly and want to use EfficientNet instead of ResNet. And your help would really matter.

fmobrj commented 5 years ago

@fmobrj I would like to ask you some questions regarding your training settings because my accuracy is not jumping above even 42%. 🙁 I used SGD with momentum of 0.999 and learning rate was set to 0.001. I tried a batch size of 32, 64, and then finally 256 (as suggested in paper: Do Better ImageNet Models Transfer Better?) and it did give around 42% whilst smaller batches stuck at around 30%+ only.

Questions

Q1. What batch size, optimizer, learning rate, and other hyper-parameters are you using? Q2. Are you using fixed-feature extractor and only updating penultimate layer or fine-tuning the network? Q3. On the basis of Q2, if you are fine-tuning EfficientNet then how many last layers are you updating the parameters of?

P.S.: I am stuck badly and want to use EfficientNet instead of ResNet. And your help would really matter.

Q1: BS=8, RMSprop with default parameters, differential learning rates (1e-3 for the outter layers and 1e-5 for the inner layers), using fastai for the learner framework Q2: I am just changing the last layer and fine tuning the pretrained b3 Q3: I kept the default architecture, changing only the last fc layer to match the 196 classes

Here is my code: https://github.com/fmobrj/EfficientNet/blob/master/cars_stanford_kaggle_squish_efficientnet_b3_git.ipynb

RahulBhalley commented 5 years ago

Thanks! I'll test this configuration tonight.

dhananjaisharma10 commented 4 years ago

@lukemelas the results show that after adding drop_connect, EfficientB3 FPN reached the same mAP as ResNet50 FPN. But I got another problem when trying to convert pth into onnx, it looks like we may rewrite the tf-like conv in pytorch style?

Hi! Can you shed some light on your detector? Were you using Efficient Net along with FPN? If yes, which all feature maps did you pass to the FPN from Efficient Net? Also, is the inference slow or fast than ResNet50? Please let me know. Thanks!

Soroorsh commented 4 years ago

Hi! is it possible to use the FPN network with EfficientNet backbone, only for classification purposes?