irfanICMLL / structure_knowledge_distillation

The official code for the paper 'Structured Knowledge Distillation for Semantic Segmentation'. (CVPR 2019 ORAL) and extension to other tasks.
BSD 2-Clause "Simplified" License
708 stars 103 forks source link

ESPNet training using 'master' & 'cvpr2019' branch #48

Open betogulliver opened 4 years ago

betogulliver commented 4 years ago

thanks for your great work.

I'm trying to train ESPNet using the 'master' branch, without success.

I tried to "recycle" some of the models/code from the 'cvpr2019' branch (networks/ESPNet.py) and use this as 'student'.

However since the ESPNet model has different architecture from the 'teacher' (ResNet101) I got a mismatch while trying to compare the features in the 'student_backward()' function. student : ESPNet : 3 features teacher : ResNet101 : 7 features this is to be expected but even my features shapes don't seem to match at all (for details: see at the end of this message)

Question 01. How can 'align/transform' the 'student' features into the 'teacher' features?

         I noticed the original code returns 'teacher' features of shape ( ...,
         65, 65) and my ESPNet returns them as (... 64, 64) (see below for details)
         I then look at it and find that commenting out the following line in 'pspnet_combine.py:130'

            self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1, ceil_mode=True) # change

         makes my teacher's features become (..., 64, 64).

Question 02. Do I need to change something else after commenting the line above?
             (it seems the args default to : args.imsize_for_adv = 65)
             (also, 'sagan_models.py:131' has this 'if self.imsize == 65')

Question 03. Which student/teacher feature should I try to match for distillation? 

         a) pixelwise loss : ???

         b) pairwise loss : feat_ind = -5(also index=2) points to 'x_feat_after_psp'
            and in the teacher's feature has shape :

              T : 2, torch.Size([2, 512, 64, 64])

            the student (ESPNet) features are :
              S : 0, torch.Size([2, 19, 512, 512])
              S : 1, torch.Size([2, 256, 64, 64])
              S : 2, torch.Size([2, 19, 64, 64])
pspnet_combine.py/ResNet.forward(...)
        ...
        return [x, x_dsn, x_feat_after_psp, x4, x3, x2, x1] 

x_feat_after_psp <-- feat_ind = -5 # (see self.criterion_pair_wise_for_interfeat = CriterionPairWiseforWholeFeatAfterPool(...)

# student_esp_d : taken from ../c01.cvpr2019/eval_esp.py
S : 0, torch.Size([2, 19, 512, 512])
S : 1, torch.Size([2, 256, 64, 64])
S : 2, torch.Size([2, 19, 64, 64])

# teacher
T : 0, torch.Size([2, 19, 64, 64])
T : 1, torch.Size([2, 19, 64, 64])
T : 2, torch.Size([2, 512, 64, 64]) <-- feat_ind = -5 # (see self.criterion_pair_wise_for_interfeat = CriterionPairWiseforWholeFeatAfterPool(...)
T : 3, torch.Size([2, 2048, 64, 64])
T : 4, torch.Size([2, 1024, 64, 64])
T : 5, torch.Size([2, 512, 64, 64])
T : 6, torch.Size([2, 256, 128, 128])
irfanICMLL commented 4 years ago

Q1. For ESPNet, I change self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1, ceil_mode=False), and retrain the teacher net. Q2 The input of the D is 1/8 of the RGB. Q3. a. Pixel wise are on the logits. b. For the student, I use the 1/8 scale feature. [2, 256, 64, 64]). For the teacher, I use the feature after PSP module. To be honest, for the distillation of the ESPNet, as described in the paper, I use the original training code of ESP project and add the distillation module on to that project.