tensorflow / tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Apache License 2.0
15.5k stars 3.49k forks source link

*help* how to use transformer model without dropout #826

Open doubler opened 6 years ago

doubler commented 6 years ago

I use the transformer encoder code like below to train and dump the model.

from tensor2tensor.models import transformer
import tensorflow as tf

hparams = transformer.transformer_base()
encoder = transformer.TransformerEncoder(hparams, mode=tf.estimator.ModeKeys.TRAIN)
x = <shape [batch_size, timesteps, 1, hparams.hidden_dim]>
enc = encoder({"inputs": x, "targets": x})
...and then use the 'enc' in the following network

And load the model like saver.restore(sess, checkpoint_file). While I found the model output is not stable. I think maybe it's because the dropout. How can I set the dropout to 1.0 in my feed_dict?

martinpopel commented 6 years ago

What do you mean by non-stable output? In inference (decoding), dropout is always turned off (i.e. set to 0). Setting dropout to 1.0 makes no sense - it would ignore all input during training. If you want to turn off dropout during training use --hparams="layer_prepostprocess_dropout=0,attention_dropout=0,relu_dropout=0" depending on which types of dropout you want to turn off. However, this is not recommended for the best results unless you have really huge training data or some other technique to prevent overfitting.

doubler commented 6 years ago

@martinpopel I just use the encoder in a library way. I didn't use the decoder. I use the output of the encoder, after training, it still has dropout.

martinpopel commented 6 years ago

The dropout should be automatically turned off both in encoder and decoder during inference because all hyperparameters ending in "dropout" are automatically set to 0.0 when not in training mode.

rsepassi commented 6 years ago

encoder = transformer.TransformerEncoder(hparams, mode=tf.estimator.ModeKeys.PREDICT)

You have the mode set to TRAIN which will have dropout enabled.

doubler commented 6 years ago

@rsepassi @martinpopel I think the tensor2tensor does not provide a good way to be used as a library. After training, I just want to load the model and use it as a service. I don't want to copy the model network construct code only for changing the mode param.

rsepassi commented 6 years ago

You may be interested in the SavedModel format that export.py outputs. That's the format we use to actually serve the models.

doubler commented 6 years ago

@rsepassi I simply use saver = tf.train.Saver(tf.global_variables()) saver.save(sess, checkpoint_file) to save model, andsaver = tf.train.import_meta_graph("{}.meta".format(checkpoint_file) saver.restore(sess, checkpoint_file) to load model for inference. Then encoder looks like not support this way.