Closed bio-mlhui closed 1 month ago
Dear @bio-mlhui
Indeed, in our original paper (NeuIPS '23), we directly implemented generalization-based objective by training a linear model on a training split and assessing its generalization on a test split. Such procedure requires backpropogating through the inner optimization (we did 300 inner iterations) to get reliable estimate of the objective and the gradients, thus will be costly.
However, in our ICML '24 paper we found that essentially the original optimization corresponds to finding a labeling that would induce the linear model with the highest margin (see Proposition 3.1 in the paper). This result also reveals that you don't really need to do these train-test splits, since maximizing the margin would imply better generalization (see Remark 3.3 in the paper). And this, in turn, allows for efficient optimization, since we, in practice, don't really need to backpropogate through the inner optimization process (See Efficient optimization paragraph in the paper), and also allows for using small number of inner iterations (i.e. we use 10 inner steps), overall leading to the efficient optimization.
Let me know if these clarifications helped.
Best, Artyom
Hi, your work is very awesome! I have one question to seek for your help: I saw that, for each batch, the whole batch is used to learn a pseudo-optimal(10 steps) inner classifier. And then the same whole batch is used to learn the outer classifier.
Could you explain why not split the whole batch, i.e. train & test, and use train/test split to learn inner/outer classifier, respectively? Since it seems more consistent with the "generalization" based loss (formulation Eq.1 in your paper).
Thanks so much in advance!