Closed AwesomeLemon closed 3 years ago
Hello,
The supernet weights are updated during the training.
Thank you for the clarifications!
But your explanation makes me wonder about the number of evaluations: if all 300 architectures from the archive are evaluated on each iteration, by the end of the search 300 * 30 = 9000 evaluations will be performed, which is much more than reported in Fig. 14. Could you help me understand the discrepancy? Perhaps you only count the number of unique architectures?
Fig.14 corresponds to the ablative experiment we did for comparing the relative search efficiency under a bi-obj scenario. The compared methods originally use different means to gauge and select archs during search, e.g, NSGANet use proxy tasks constructed by down-scaling archs and reducing training epochs, etc; and NAT use supernet. To make a fair comparison under the same x-axis of # of archs required to reach certain hv (y-axis), we provide the trained supernet (on each of the three datasets) to all three methods. Then the random search
simply uniformly samples archs from the search space, while NSGANet
uses genetic operations (crossover + mutation + EDA) to generate archs. And both methods query the supernet (i.e., # of archs eval.ed) for every archs created. NAT instead builds an accuracy predictor from the initial population, then only evaluates a small subset from the candidate pool returned by NSGA-III on the accuracy predictor. All methods start with an initial population of 100 random archs, and we terminate the search when NSGANet or random search catch-up with NAT. We are sorry about the confusion.
Ah, thanks for the explanation. Correct me if I'm wrong, but is it true that in this scenario the supernet weights are not trained continuously (when using NAT)? From your description it seems to me that this experiment is the same one as in the NsgaNetV2 paper Fig. 5, where the supernet weights are fixed and are simply used for initializing the weights of the offspring in each generation.
Ah, thanks for the explanation. Correct me if I'm wrong, but is it true that in this scenario the supernet weights are not trained continuously (when using NAT)? From your description it seems to me that this experiment is the same one as in the NsgaNetV2 paper Fig. 5, where the supernet weights are fixed and are simply used for initializing the weights of the offspring in each generation.
This experiment is essentially the same as the one in NSGANetV2 paper, except the choice of the accuracy predictor. The supernet is trained already for all three datasets and are used to assist the validation of the search components only.
Great, thanks again!
Hello,
I was wondering whether the weights of the supernetwork are continuously trained during the search?
I noticed that in the code of your previous paper (NSGANetV2) that you reference in another issue, the supernet is not actually trained during the search; instead, the supernet weights are used for initializing subnet weights, which are trained for 5 epochs, used for evaluation, and are then discarded; the next subnet is initialized again with the original supernet weights.
Which is why I'd like to know whether NAT does that too?
An additional question: if NAT doesn't discard the trained weights, how do you deal with the fact that the performances in the archive were reported based on older weights? Doesn't this negatively impact the predictor's accuracy?
Thanks in advance!