taoyang1122 / adapt-image-models

[ICLR'23] AIM: Adapting Image Models for Efficient Video Action Recognition
Apache License 2.0
278 stars 21 forks source link

It seems that the training memory is not reduced #29

Open casillas1111 opened 1 year ago

casillas1111 commented 1 year ago

Thank you for your excellent work.

After using the Adapter, I only passed the Adapter parameters to the optimizer. However, the training memory did not go down. I verified that the rest of the Transformer parameters were set to requires_grad = False. The code is as follows:

for name, param in model.named_parameters(): if "Adapter" in name: param.requires_grad = True else: param.requires_grad = False optimizer = torch.optim.AdamW(lr=3e-4 params = filter(lambda p: p.requires_grad, model.parameters()), weight_decay=0.05,)

Looking forward to your reply.

taoyang1122 commented 1 year ago

Hi, you can unfreeze the other model parameters by commenting https://github.com/taoyang1122/adapt-image-models/blob/main/tools/train.py#L187-L189. Then you can compare the memory cost. The memory saving is not as much as the number of parameters.