klauscc / TALLFormer

Apache License 2.0
50 stars 3 forks source link

ChunkVideoSwin has no gradients #8

Open SimoLoca opened 1 year ago

SimoLoca commented 1 year ago

Hi, I have a question related to the training of Tallformer. In particular I noticed that when training the backbone, ChunkVideoSwin has gradients set to None (I think this may lead to problems during backward computation). Is it a normal behaviour or is there something wrong?

To test I've inserted here https://github.com/klauscc/TALLFormer/blob/5519140e39095cd87d9b50420bde912975cae9fb/vedatad/models/detectors/mem_single_stage_detector.py#L67

this line of code:

for name, param in self.backbone.named_parameters():
            print("name: ", name, "grad: ", param.grad)
klauscc commented 1 year ago

Hi SimoLoca, the gradients should be None during forward. You can only get gradients after loss.backward() and before optimizer.zero_grad(). If you want to see the gradients, you can insert your code right after L23: https://github.com/klauscc/TALLFormer/blob/5519140e39095cd87d9b50420bde912975cae9fb/vedacore/hooks/optimizer.py#L23

SimoLoca commented 1 year ago

Hi @klauscc, thanks for the fast reply. I tried as you have mentioned, precisely after L23 I've inserted:

for name, param in looper.train_engine.model.backbone.named_parameters():
    print("name: ", name, "grad: ", param.grad)
for name, param in looper.train_engine.model.neck.named_parameters():
    print("name: ", name, "grad: ", param.grad)
for name, param in looper.train_engine.model.head.named_parameters():
    print("name: ", name, "grad: ", param.grad)

And interestingly, the backbone has all gradients set to None, while neck and head has gradients. Is this behavior, therefore, correct? And lastly, will this mean that during training the backbone is freezed, and if so how to "unfreeze" it? Thanks so much!

klauscc commented 1 year ago

Hi @SimoLoca, I did a quick check and the backbone is indeed updated during training:

>>> import torch
>>> s1 = torch.load('epoch_600_weights.pth',map_location="cpu")
>>> s2 = torch.load('epoch_1000_weights.pth',map_location="cpu")
>>> w1 = s1['backbone.layers.2.blocks.16.mlp.fc2.bias']
>>> w2 = s2['backbone.layers.2.blocks.16.mlp.fc2.bias']
>>> torch.allclose(w1,w2)
False
>>> w1[:10]
tensor([ 0.0496,  0.0174,  0.0173, -0.1023,  0.0316,  0.8908, -0.1456, -0.1831,
        -0.3061, -0.3634])
>>> w2[:10]
tensor([ 0.0492,  0.0165,  0.0165, -0.1018,  0.0315,  0.8822, -0.1449, -0.1810,
        -0.3043, -0.3599])
>>>

In the config file: https://github.com/klauscc/TALLFormer/blob/main/configs/trainval/thumos/1.0.0-vswin_b_256x256-12GB.py#L99 the first 2 stages of the backbone is frozen. In Swin-B there are 24 layers, we only tune the last 20 layers (the last two stages). Did you only check the first several parameters?

SimoLoca commented 1 year ago

I check that following the readme for training the model, without loading checkpoint. And also I did not change the config file. Did I get it wrong the way to check if the backbone's weights are updated?

SimoLoca commented 1 year ago

Hi @klauscc, I've resolved the issue. There are no problem with the code, there were some errors on my config file, so forgive me if i disturbed you too much. Just one last question: during the feature extraction phase, might it make sense to use a stride? For example, with a stride of 16, processing frames 0 - 32, then 16 - 48, and so on?

Thanks you so much

klauscc commented 1 year ago

It's great you figure it out! Yes I believe extracting features with a stride may lead to higher performance. But in this way your computational cost will increase; and you need to make some changes to the backbone code to process frames the same way.

SimoLoca commented 1 year ago

Ok, thanks you. So I need to make some changes in SwinTransformer3D or in ChunkVideoSwin?