Open casillas1111 opened 1 year ago
Hi, you can unfreeze the other model parameters by commenting https://github.com/taoyang1122/adapt-image-models/blob/main/tools/train.py#L187-L189. Then you can compare the memory cost. The memory saving is not as much as the number of parameters.
Thank you for your excellent work.
After using the Adapter, I only passed the Adapter parameters to the optimizer. However, the training memory did not go down. I verified that the rest of the Transformer parameters were set to requires_grad = False. The code is as follows:
for name, param in model.named_parameters(): if "Adapter" in name: param.requires_grad = True else: param.requires_grad = False optimizer = torch.optim.AdamW(lr=3e-4 params = filter(lambda p: p.requires_grad, model.parameters()), weight_decay=0.05,)
Looking forward to your reply.