Questions about implementation.

whyygug commented 2 years ago

Hello, thank you very much for your work, I have some questions.

I noticed that you added a argument "batch_fs_size" to args.py and its default value is set to 20, does this argument affect the test performance? From my understanding, it seems to affect only the GPU memory usage and speed during testing.
How to reproduce the results on CUB and tiered-ImageNet?
According to the command you gave for mini-ImageNet, the computation and back-propagation of a batch's loss is performed twice, where the first time is the classification loss with manifold mixup and the second time is the original classification loss and the rotation loss, i.e., a batch contains two gradient updates. Why does it need to be done in this way? Does this mean that the training of 500 epochs is approximately equivalent to the training of 1000 epochs for the methods with one-time gradient update for a batch? I personally tried to perform the manifold mixup classification and the roation loss together, (according to the code: if mixup and args.mm: ......) https://github.com/ybendou/easy/blob/85937e0d2d67a801dba7a96974a79c2d6cad86b7/main.py#L84-L96 but this resulted in a slight performance degradation, is this because there is a potential conflict between manifold mixup and rotation?
Have you studied the compatibility of the different augmentation methods with the self-supervised rotation loss? For example, does random horizontal flipping break the rotation loss? I'd like to hear your opinion.

Sorry for my poor English, I hope I expressed my question clearly. Best,

ybendou commented 2 years ago

Hello,

The argument batch_fs_size doesn’t change the test performance. It’s unique purpose is to fit the data on the gpu. There is no gradient update during the few shot evaluation and the runs entirely parallel. The larger its value, the faster the testing will be. As long as your gpu can fit the data.
You can change the argument dataset to ‘tieredimagenet’ and ‘cubfs’.
We have implemented the mixup and ssl according to our understanding to S2M2R where the gradients are updated once with mixup and once with rotations in order to get the best out of the regularisation of mixup and avoid its downfalls. We have also tried doing rotations and mixup at the same time but the performance was worse. Interpolating images with different rotation angles could be troublesome.
During the backbone training, ther other augmentations such as horizontal flip and color jittering are used in the dataloader before the rotation. The rotation is added in the training loop. So yes, they are compatible and already used together in the code.

I hope this answers your concerns. Feel free to ask if something isn't clear.

Best,

whyygug commented 2 years ago

Thanks for your reply. I have some other questions.

I tried to replace the SGD optimizer with the Adam optimizer, but this caused almost no change in performance. Are my experimental results consistent with yours?
In your paper, you reported the results of S2M2R with the backbone of WRN-28-10. If I want to use WRN-28-10 as the backbone, should I use "wideresnet" or "s2m2r"? Do I need to modify other arguments, such as changing "feature_maps" to 16? Also, what is the difference between "wideresnet" and "s2m2r"?

Best,

ybendou commented 2 years ago

Hello,

I don't remember if the performance was different, but we chose to go with SGD as it is easier to understand its behaviour.
The "s2m2r" uses the same classifier used in the s2m2r paper which is a modified version of the standard logit classifier. So if you want to just a WRN-28-10backbone instead of a ResNet12 with the usual logit classifier, you should use "wideresnet" as an option. If your purpose is to reproduce s2m2r paper's results, you need to use the same hyperparmeters as theirs which I don't remember exactly, use adam with 400 epochs without mixup then use mixup after that for >400 epochs. Lastly, you should change "features_maps" to 16.

Let me know if you manage to make it work or if you have any questions.

Best,

whyygug commented 2 years ago

Hello,

I made a mistake in my previous statement about the Adam optimizer. At that time the training was not yet finished and I misjudged that there was not much difference between the two optimizers based on the initial performance of the model, but today after the training I found that directly replacing SGD with Adam optimizer leads to serious performance degradation. According to my re-implementation, the model with SGD shows 67% on 1-shot and 83.7% on 5-shot, which similar to the results in your paper, but the model with Adam ("--lr = -0.1") shows 52% on 1-shot and 69% on 5-shot. Maybe replacing SGD directly with Adam is not a appropriate choice.

I don't have any more questions at the moment, thank you very much for your support. Best,

shepherdls commented 1 year ago

When running the experiment, how to enter the 3 feature file paths。 --test-features "[/minifeatures1.pt11, /minifeatures2.pt11, /minifeatures3.pt11]" when input -test-features "[~/lishuai/hhh/minifeatures1.pt11, ~/lishuai/hhh/minifeatures2.pt11, /lishuai/hhh/minifeatures3.pt11]" alway output FileNotFoundError: [Errno 2] No such file or directory: '[/lishuai/hhh/minifeatures1.pt11, ~/lishuai/hhh/minifeatures2.pt11, ~/lishuai/hhh/minifeatures3.pt11]' what's wrong? how to resolve?

ybendou / easy

Questions about implementation. #11