Open goodmansasha opened 5 years ago
I don't know what's the problem you are trying to solve but in general to achieve good results with transfer learning in Tensor2Tensor:
--warm_start_from=/path/to/lm/checkpoint
.@JohannesTK do you have any detailed information about what actually causes layers to be frozen? Looking at the hparams I can only see some parameters connected to multi-problem but not actually pretraining in the sense of freezing parts of a model after pretraining:
@registry.register_hparams
def transformer_tall_finetune_textclass():
"""Hparams for transformer on LM for finetuning on text class problems."""
hparams = transformer_tall()
hparams.learning_rate_constant = 6.25e-5
hparams.learning_rate_schedule = ("linear_warmup*constant*linear_decay")
hparams.multiproblem_schedule_max_examples = 0
hparams.multiproblem_target_eval_only = True
hparams.learning_rate_warmup_steps = 50
# Set train steps to learning_rate_decay_steps or less
hparams.learning_rate_decay_steps = 25000
hparams.multiproblem_reweight_label_loss = True
hparams.multiproblem_label_weight = 0.95
return hparams
I am currently experimenting myself with multi-problem (see https://github.com/tensorflow/tensor2tensor/issues/1687) but similar to @goodmansasha I would like to know how to freeze e.g. the entire encoder of the Transformer or whether this is still something that has to be done in t2t.
Can you provide any information on that?
@stefan-falk were you able to figure out how to freeze layers in T2T? I have a similar scenario where I have a pretrained transformer, and I'd like to freeze the base layers during finetuning. It doesn't look like T2T supports this out-of-box, but maybe you found a way to accomplish this manually?
@gabegrand Unfortunately now. I had to abandon this approach for now.
Description
I'm attempting to do very basic transfer learning using a transformer, and am asking if someone could point me towards an example of how to do that in tensor2tensor.
I've seen Radford et al's work ( https://blog.openai.com/language-unsupervised/ ), which is inspiring. However, I see no hyperparameters in tensor2tensor to simply freeze certain keras layers by setting
trainable=False
.This basic idea is to pre-train a model first using a lower quality dataset created by another computer program (i.e. a "silver" dataset), and then to finetune that model using a hand curated "gold" dataset created by a human expert. The silver dataset is much larger than the gold, but the gold takes a long time to produce. The idea is that after the "silver model" is trained on the silver dataset, most of its layers are frozen except the final layer, and then it is trained on the gold data. That way, the model does not forget what it learned from the silver dataset. I tried doing this without freezing layers and unfortunately the model started to degrade and lose what it learned form the silver data.
...
Environment information
Python 3.6.7
For bugs: reproduction and error logs