--change_output Option and Transferring Learning

Reposting a conversion with the author of the paper and this code regarding the --change_output option for others who might have a similar question.

My original question to George: "Thanks, one more I'm going to shoot at you (and post to github) is around the "--change_output" config option. I'm holding thumbs to hear that this is an option that allows us to pre-train, possibly fine-tune a model on a specific task, then load that fine-tuned model, change the output layer and fine-tune it for a another task? And we might even want to freeze everything except the norm and output layers?"

Answer: "Yes, that's pretty much the envisioned use, but practically the way it implements it is very simple: when you use it, all weights except for the output layer will be loaded from the specified checkpoint. The output layer weights (their name should start with "output_layer") will be initialized as defined in the model's code. So this indeed allows you to either fine-tune the same exact model for another task, or define another model (e.g. subclass of original) with a different output layer (e.g. different output dimensions).

The --freeze option will allow you to do the second thing you are asking. No gradients will be computed (and no parameter updates performed) for any layer except for the output layer. The norm layers suggestion is interesting; in my case, I was simply using this to evaluate pre-training - fine-tuning on the same exact input dataset, so the batchnorm statistics where the same. However, if you want to change the dataset, then yes, it makes sense to make the batchnorm parameters trainable. Here is where this can be added: https://github.com/gzerveas/mvts_transformer/blob/3f2e378bc77d02e82a44671f20cf15bc7761671a/src/main.py#L146"

gzerveas / mvts_transformer

--change_output Option and Transferring Learning #42