wpeebles / G.pt

Official PyTorch Implementation of "Learning to Learn with Generative Models of Neural Network Checkpoints"
https://www.wpeebles.com/Gpt
BSD 2-Clause "Simplified" License
336 stars 21 forks source link

Can it generate weights for unseen losses? #2

Closed sentialx closed 2 years ago

sentialx commented 2 years ago

Let's say I generate millions of checkpoints of a model using Adam optimizer and it reaches a minimum loss of e.g. 0.8. Can G.pt generate weights for loss lower than 0.8?

sentialx commented 2 years ago

In the paper under the limitations section, it is stated that "Second, our current G.pt models struggle to extrapolate to losses and errors not present in the pre-training data.", but what does it mean "struggle"? Is it completely unable to optimize the model further or it just doesn't achieve the desired loss perfectly?

wpeebles commented 2 years ago

Hi @sentialx, thanks for the question. The models are usually unable to generate parameters for losses outside the training set's range. The specific behavior seems to depend on how "extreme" the prompted loss is relative to those in the training set. For example, if you ask for a loss slightly lower than the best in the training set (e.g., asking for 0 test loss when the best in MNIST is ~0.2), the model will typically output a ~0.2 loss network (Figure 5 in the paper). On the other hand, if you ask for a drastically lower value (e.g., asking for zero loss on CIFAR-10 when the lowest in the training set is like 1.1), the model will give you a worse-performing network than if you asked for something with a slightly higher loss closer to the bounds of the training set. Hope this helps!