Closed lucastononrodrigues closed 4 years ago
In order to reduce memory consumption, I recommend reducing --train-shot (change it to for example 15 to 10) rather than --episodes-per-batch.
From my experience, batch size matters a lot in SGD.
Also, It is empirically verified in Group Normalization paper (https://arxiv.org/abs/1803.08494) that instance normalization is suboptimal for image classification.
Thank you very much for the explanation, I will do that and keep you in touch. We are applying your work to other datasets at the moment, we will share the results if possible.
Hello, I've been trying to repeat your results for a bachelor project where I will be applying your algorithm in different meta datasets.
I obtained lower accuracies, about 5 to 10 % for each meta dataset you used (except for FC100 where I have obtained it relatively close accuracy). I made some changes in your code in order to fit my gpu memory (lowered the amount of batches per episode to 4). My first guess was that this could make a difference mainly in the Batch Normalization in ResNet so I changed it to Instance Normalization instead, as a result I've obtained even lower results.
Would you help me understand why? Other than that, would you have any suggestions of meta-datasets we could use?
Thank you in advance. Lucas Tonon Rodrigues.