ybendou / easy

This repository is the official implementation Ensemble Augmented-Shot Y-shaped Learning: State-Of-The-Art Few-Shot Classification with Simple Ingredients.
MIT License
112 stars 18 forks source link

Some questions about the code. #3

Closed whyygug closed 2 years ago

whyygug commented 2 years ago

Hi, your work is wonderful! Thank you for the detailed and neat code! I have some questions about the code.

About the network:

  1. I'm a bit confused about the network structure of ResNet12, it seems to have 3 convolutional layers in each BasicBlock? But the official ResNet18 provided by Pytorch has only two convolutional layers in each BasicBlock.
  2. Why is the MaxPooling operation in ResNet12 performed outside the BasicBlock and not inside the BasicBlock as in ResNet18?
  3. ResNet18 has a total of 17 convolutional layers and one FC layer, so it contains a total of 18 layers with trainable weights. But ResNet12 in this code has 12 convolutional layers and one FC layer, that is, it has 12 layers with trainable weights, should it be called ResNet13?
  4. The number of parameters in ResNet12 (12.4M) is even more than that of ResNet18 (11.2M), which makes me confused. Why do we design a network that seems lighter from the name but is actually heavier? I replaced ResNet12 with ResNet18 for training and found that the performance dropped.

About training and testing:

  1. When centering the feature vectors, each vector should minus an average feature vector. The paper says that “\overline{z} is the average feature vector of the base dataset if in inductive setting or of the few-shot considered problem if in transductive setting.” But the code seems to show that \overline{z} is still used as the average vector of the base dataset in transductive setting?
  2. If I want to reproduce a Y ResNet12(√2) model, should the script be “_python main.py --dataset-path "" --dataset miniimagenet --model resnet12 --epochs 0 --manifold-mixup 500 --rotations --cosine --gamma 0.9 --milestones 100 --batch-size 128 --preprocessing ME --feature_maps 45_”
  3. Why set forceCPU=True for val_loader and novel_loader?
  4. What does the .pt55 suffix file represent in the pre-trained weights and features provided? I tried to test the features in minifeatures1.pt55 and found its performance inferior to minifeatures1.pt11. By the way, can Pytorch read and write files in .pt11 or .pt55 format? I know the common formats include .pt, .pkl, and .pth, but what format is .pt11?
ybendou commented 2 years ago

Hi, thank you for your interest in our work and for your detailed feedback.

I’ll try to give you a detailed answer for your questions:

About the network:

The ResNet12 architecture has been widely used in the few-shot literature in the recent years. It is hard to pinpoint which paper first introduced the ResNet12. You can refer to the papers linked bellow which aggregate some papers using ResNet-12 such as MetaOptNet, FEAT, TADAM, SNAIL and R2D2.

  1. Exactly, the ResNet12 is a different architecture than the ResNet18, it is not officially provided by PyTorch.
  2. Having the MaxPooling outside or inside the BasicBlock makes no difference, it is easier to have it like that for manifold mixup implementation wise.
  3. & 4. You remark is pertinent, the naming of ResNet12 is debatable as you mentioned. The naming of the ResNet12 comes from the fact that we use 4 residual blocks of 3 convolutional layers. Our paper does not introduce this architecture nor name it, multiple papers have been referring to it as ResNet12. Regarding performance, we have also tested a ResNet18 and found similar results with poorer performance than ResNet12. The number of parameters in ResNets is not only linked to the depth of the network, it also strongly depends on the number of feature maps used in the architecture.

About training and testing:

  1. In the transductive setting, we use the average vector of the novel dataset. To do so, you need to run the script with the argument (—postprocessing ‘ME’) instead of (—preprocessing ‘ME’) as provided in the example in the README. In this case, preprocessing is set to an empty string by default which will do nothing and instead postprocessing will be used which will use the average of the novel data (query+support), it is done in the following line : https://github.com/ybendou/easy/blob/1b57bca2ed405b83e6a3767b24c626dddd28bb3d/few_shot_eval.py#L108
  2. That’s right, you just need to change the —feature-maps to 45.
  3. It is not currently supported in our implementation, but force_cpu=False will be included in our future work.
  4. In order to take the best epoch of the backbone during training, we use the validation dataset. The best performing model on the validation could either be evaluated on the performance on 1-shots on 5-shots. Therefore, the suffix .pt11 or .pt55 refers to the evaluation on the 1-shots and the 5-shots respectively. It is expected that on .pt55 will perform worse on the 1 shot and .pt11 will perform worse on 5-shots but it is not necessarily the case, as the validation performance is only a proxy to the novel score. In the paper we report the results of .pt11 for 1-shot and the results of .pt55 for 5-shot. Also, in torch you can read any format, you could even drop the .pt format and just write minifeatures155 and the file would still be read and written.
whyygug commented 2 years ago

I sincerely appreciate your prompt and detailed reply. Best wishes!