learnables / learn2learn

A PyTorch Library for Meta-learning Research
http://learn2learn.net
MIT License
2.59k stars 348 forks source link

How to use `l2l.vision.models.ResNet12`? #389

Closed Jeong-Bin closed 1 year ago

Jeong-Bin commented 1 year ago

Hi, I'm using l2l to create large MAML model. However, I have a question regarding the usage of l2l.vision.models.ResNet12 or WRN28. I tried the following 3 methods.

# Method 1
class Lambda(nn.Module):
    def __init__(self, func):
        super().__init__()
        self.func = func
    def forward(self, x):
        return self.func(x)

features = l2l.vision.models.ResNet12(output_size=256)
features = torch.nn.Sequential(features, Lambda(lambda x: x.view(-1, 84)))
features.to(device)
head = torch.nn.Linear(84, ways)
head = l2l.algorithms.MAML(head, lr=fast_lr)
head.to(device)
all_parameters = list(features.parameters()) + list(head.parameters())
optimizer = optim.Adam(all_parameters, meta_lr)

In Method 1, Error RuntimeError: only batches of spatial targets supported (3D tensors) but got targets of size: : [5] occurred. Also, when I modified code to lambda x: x.view(-1, 256) and torch.nn.Linear(256, ways), RuntimeError: mat1 and mat2 shapes cannot be multiplied (1260x84 and 256x5) occurred.


# Method 2
features = l2l.vision.models.ResNet12(output_size=256)
features = torch.nn.Sequential(features, Lambda(lambda x: x.view(-1, 256)))
features.to(device)
head = l2l.vision.models.MiniImagenetCNN(ways)
head = l2l.algorithms.MAML(head, lr=fast_lr)
head.to(device)
all_parameters = list(features.parameters()) + list(head.parameters())
optimizer = optim.Adam(all_parameters, meta_lr)

Method 2 worked well, but its test accuracy was lower than basic MAML model. I used following code for basic MAML.

model = l2l.vision.models.MiniImagenetCNN(output_size=ways)
model.to(device)
maml = l2l.algorithms.MAML(model, lr=fast_lr, first_order=False)
optimizer = optim.Adam(maml.parameters(), meta_lr)

# Method 3
model = l2l.vision.models.ResNet12(output_size=ways)
model.to(device)
maml = l2l.algorithms.MAML(model, lr=fast_lr, first_order=False)
optimizer = optim.Adam(maml.parameters(), meta_lr)

Method 3 worked well during training, but I encountered OutOfMemoryError in testing. (Actually, the training was very slow.)

What is the right way, and what should I modify? Or is there any other way to make a large MAML model?

I set the training and testing configurations as follows:

# train setting
ways=5
shot=1
adaptation_steps=5
batch_size=4
meta_lr : 1e-3,
fast_lr : 0.01

# test setting
ways=5
shot=15
adaptation_steps=10
batch_size=4
fast_lr : 0.01
seba-1511 commented 1 year ago

Hello @Jeong-Bin,

Method 3 is correct. Try using maml.clone(first_order=True) when testing. Or, you can reduce the number of adaptation steps (at the price of performance).

How much GPU memory do you have? If you have more than 1 GPU, you can use model.features = torch.nn.DataParallel(model.features) to distribute the activations on the GPUs.

Jeong-Bin commented 1 year ago

@seba-1511 Thanks! My GPU is RTX 3090 Ti with 20GB memory. I'll try your solution.

Additionally, I referred to 'adaptation steps' in MAML paper. In section A.1. Classification of the paper, authors use 10 gradient steps at test time. Is the gradient step means adaptation step?

seba-1511 commented 1 year ago

Yes, gradient steps are adaptation steps.

Jeong-Bin commented 1 year ago

All right, thank you for your help! Have a nice day😊