It seems that the training memory is not reduced

taoyang1122 / adapt-image-models

[ICLR'23] AIM: Adapting Image Models for Efficient Video Action Recognition

Apache License 2.0

278 stars 21 forks source link

Thank you for your excellent work.

After using the Adapter, I only passed the Adapter parameters to the optimizer. However, the training memory did not go down. I verified that the rest of the Transformer parameters were set to requires_grad = False. The code is as follows:

for name, param in model.named_parameters(): if "Adapter" in name: param.requires_grad = True else: param.requires_grad = False optimizer = torch.optim.AdamW(lr=3e-4 params = filter(lambda p: p.requires_grad, model.parameters()), weight_decay=0.05,)

Looking forward to your reply.

taoyang1122 / adapt-image-models

It seems that the training memory is not reduced #29