zylo117 / Yet-Another-EfficientDet-Pytorch

The pytorch re-implement of the official efficientdet with SOTA performance in real time and pretrained weights.
GNU Lesser General Public License v3.0
5.2k stars 1.27k forks source link

Train on coco from scratch( all AP=0 ) #361

Open CuiZhiying opened 4 years ago

CuiZhiying commented 4 years ago

First, thank you very much for providing this beautiful code for us!

It's all right when I use the pretrained model from your repositories. But when I try to train a efficientDet-d2 from scratch, I get into trouble.

  1. I could not load the ImageNet pretrained model for EfficientNet. I'm in the mainland of China, I can Not open the URL of efficientnet pretrained model even if I have open the VPN. So, I wonder if there are any more method for I to load the ImageNet pretrained model? I notice the default setting is not to use the pretrained model.

  2. When I try to train on coco from scratch without any pretrained model, I found that all my saved checkpoint is unusable. All the mAP is zero (checkpoint in 60 epoch), By the way, when I use the same code to test your trained parameters, everything is OK:

    Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
    Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.000
    Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
    Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
    Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
    Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
    Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.000
    Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.000
    Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
    Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
    Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
    Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000

and the tensorboard curve is: image image image image

I know It's really hard to train from scratch, But I think this result is quit strange. Could you please give me any advises?

zylo117 commented 4 years ago

Effnet is quite deep and so is effdet. Training from scratch without pretrained backbone weights is hardly realistic. Since I modifed some of the backbone's varibles' names, it is not possible to load backbone weights directly. But with a little fix, it is quite easy to do that. https://github.com/zylo117/Yet-Another-EfficientDet-Pytorch/issues/318#issuecomment-635075616

CuiZhiying commented 4 years ago

thanks you very much for your quick reply! Now I have change the pretrained model loading as the follow:

  1. change the urls in efficientnet/utils.py in line 296:
    296 url_map_advprop = {        
    297     #'efficientnet-b0': 'https://publicmodels.blob.core.windows.net/container/advprop/efficientnet-b0-b64d5a18.pth',
    298     #'efficientnet-b1': 'https://publicmodels.blob.core.windows.net/container/advprop/efficientnet-b1-0f3ce85a.pth',                                                              
    299     #'efficientnet-b2': 'https://publicmodels.blob.core.windows.net/container/advprop/efficientnet-b2-6e9d97e5.pth',
    300     #'efficientnet-b3': 'https://publicmodels.blob.core.windows.net/container/advprop/efficientnet-b3-cdd7c0f4.pth',
    301     #'efficientnet-b4': 'https://publicmodels.blob.core.windows.net/container/advprop/efficientnet-b4-44fb3a87.pth',
    302     #'efficientnet-b5': 'https://publicmodels.blob.core.windows.net/container/advprop/efficientnet-b5-86493f6b.pth',
    303     #'efficientnet-b6': 'https://publicmodels.blob.core.windows.net/container/advprop/efficientnet-b6-ac80338e.pth',
    304     #'efficientnet-b7': 'https://publicmodels.blob.core.windows.net/container/advprop/efficientnet-b7-4652b6dd.pth',
    305     #'efficientnet-b8': 'https://publicmodels.blob.core.windows.net/container/advprop/efficientnet-b8-22a8fe65.pth',
    306     'efficientnet-b0': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b0-b64d5a18.pth',
    307     'efficientnet-b1': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b1-0f3ce85a.pth',
    308     'efficientnet-b2': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b2-6e9d97e5.pth',
    309     'efficientnet-b3': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b3-cdd7c0f4.pth',
    310     'efficientnet-b4': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b4-44fb3a87.pth',
    311     'efficientnet-b5': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b5-86493f6b.pth',
    312     'efficientnet-b6': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b6-ac80338e.pth',
    313     'efficientnet-b7': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b7-4652b6dd.pth',
    314     'efficientnet-b8': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b8-22a8fe65.pth',
    315 }
  2. load the pretrained model with correct names in efficientnet/utils.py :
    318 def load_pretrained_weights(model, model_name, load_fc=True, advprop=False):
    319     """ Loads pretrained weights, and downloads if loading for the first time. """
    320     # AutoAugment or Advprop (different preprocessing)
    321     url_map_ = url_map_advprop if advprop else url_map
    322     pretrained_dict = model_zoo.load_url(url_map_[model_name], map_location=torch.device('cpu'))
    323 
    324     model_dict = model.state_dict() 
    325 
    326     for name in copy.deepcopy( model_dict ).keys():
    327         if name not in pretrained_dict.keys():
    328             name_list = name.split('.')     
    329             name_list.pop(-2)
    330             #print(name_list)
    331             pretrained_name = '.'.join( name_list )
    332             #print(pretrained_name)         
    333         else:              
    334             pretrained_name = name          
    335         model_dict[name] = pretrained_dict[pretrained_name]
    336 
    337     ret = model.load_state_dict(model_dict, strict=False)
    338 
    339     print('Loaded pretrained weights for {}'.format(model_name))

    Now, I'm retraining the model. But I think there are still some other problems. As the ohter issues mention, the result is still really bad after load the ImageNet pretrained. In my case, it's quiet strange to gain all mAP=0. Anyway, thank you very much, and I'll come back soon...

zylo117 commented 4 years ago

Just a heads-up. I'm not sure about this line, if name not in pretrained_dict.keys(): Are you sure all those names that are not in pretrained weights should be renaming this way?

CuiZhiying commented 4 years ago

yes, I have check the names just now. A short cut is show as bellow. And my code is running now without any warning image

CuiZhiying commented 4 years ago

What really strange is when I save all Image results, and check it with my eye, It's obvious not so bad. Many prediction are reasonable with correct label( I reedit the code in efficientdet_test.py).

zylo117 commented 4 years ago

then maybe the categories id are mismatched in your annotations. you should try enable debug mode in training and see how training performs

CuiZhiying commented 4 years ago

I think that the categories id is matched as I could test your pre-trained model and get the exactly the same mAP. And everything is fine on the Shape dataset. Could you mind provide your train config file for me? Such as model name, learning rate, optim, convergence epoch etc. Thank you very much~

ryusaeba commented 4 years ago

@CuiZhiying Are you able to get this correctly? If so, may I have your help?