Open e-lectrix opened 6 years ago
Try --hparams="max_length=128,eval_drop_long_sequences=True"
or just the eval_drop_long_sequences because the default max_length is the training-time batch size, which may be enough to prevent the OOM errors.
However, I am not sure if you'll be able to identify which sentences were skipped. In the log (stderr) you can see also the source sentence, so maybe you can use this.
Martin, thank you very much for your quick answer. Unfortunately, I can't make it work using these two parameters (and decreasing max_length).
My call
t2t-decoder --data_dir=$DATA_DIR --problems=$PROBLEM --model=$MODEL --hparams_set=$HPARAMS --output_dir=$TRAIN_DIR --hparams="max_length=60,eval_drop_long_sequences=True" --decode_hparams="beam_size=4,alpha=$ALPHA" --decode_from_file=$DECODE_FILE --decode_to_file=$DECODE_FILE.DEEN.mixed.out --batch_size=4 --t2t_usr_dir=$USER_DIR
The parameters seem to have been taken into account:
INFO:tensorflow:Importing user module usr_dir from path /myproject [2018-04-20 14:30:22,169] Importing user module usr_dir from path /myproject INFO:tensorflow:Overriding hparams in transformer_base with max_length=60,eval_drop_long_sequences=True [2018-04-20 14:30:22,433] Overriding hparams in transformer_base with max_length=60,eval_drop_long_sequences=True INFO:tensorflow:schedule=continuous_train_and_eval [2018-04-20 14:30:22,433] schedule=continuous_train_and_eval
This is the final error output:
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[347280,7235] [[Node: transformer/body/parallel_0/body/encoder/layer_0/self_attention/multihead_attention/dot_product_attention/Softmax = Softmax[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](transformer/body/parallel_0/body/encoder/layer_0/self_attention/multihead_attention/dot_product_attention/Reshape)]] [[Node: transformer/body/parallel_0/body/encoder/layer_5/self_attention/multihead_attention/q/Tensordot/Shape/_1705 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1841_transformer/body/parallel_0/body/encoder/layer_5/self_attention/multihead_attention/q/Tensordot/Shape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Any other idea, what I could have a look at? Thank you!
[EDIT] I'm running version 1.5.5 of tensor2tensor, could this be a reason as well?
As I think about now, I am not sure if eval_drop_long_sequences
affects t2t-decoder
at all. There are three regimes (with different sessions): training, evaluation and decoding. During evaluation you also need to decode, but you have the reference translations so you can cheat with non-autoregressive fast mode and it makes sense to allow skipping long sentences here if eval_drop_long_sequences=True
. In real decoding, you don't have the reference translations (at least t2t does not see them) and most users want to translate all the sentences.
T2T 1.5.5 is OK.
You can filter the text to be translated for sentences shorter than a given number of words with a simple shell/perl/python script. To filter based on the number of subwords, you can use something like (just out of my memory):
from tensor2tensor.data_generators import text_encoder
vocab = text_encoder.SubwordTextEncoder(FLAGS.vocab)
n_subwords = len(vocab.encode(string))
Thank you. I will then go forward and write some logic prior to the decoding process to prevent these cases from happening and crashing my workflow. Should be no problem to implement.
I think though that this sentence skipping option would be a nice feature to add to the t2t-decoder functionality.
Thanks for your help, much appreciated!
Skipping an eval sentence maybe a bad idea. It will hurt the final test score badly for dropping a whole sentence. Also, splitting long sentences at the preprocessing stage (is good, but it ) isnt going to perfectly solve this either since we wouldn't know whats the true length after BPE/subword segmentation. In my case, some shorter sentences got expanded to really long sequence after subword splitting.
At the least, we should be able to truncate the sentence to a certain max length instead of completely skipping it.
Looking at the code https://github.com/tensorflow/tensor2tensor/blob/a0bf3b90b13f75e77fdacf5da025d09309165b92/tensor2tensor/utils/decoding.py#L679-L682 which is what we want. It gets the value from https://github.com/tensorflow/tensor2tensor/blob/a0bf3b90b13f75e77fdacf5da025d09309165b92/tensor2tensor/utils/decoding.py#L449-L453
But that default value is set to -1
(meaning: dont truncate)
https://github.com/tensorflow/tensor2tensor/blob/a0bf3b90b13f75e77fdacf5da025d09309165b92/tensor2tensor/utils/decoding.py#L64
(Now, how I figured mapping of decode_hp
to --decode_hparams
CLI argument via FLAGS
stuff! I still don't know 🤣 )
So, this is how we can pass a value from CLI:
$ t2t-decoder --decode_hparams="max_input_size=190" <other args>
max_input_size=190
worked well for me. You may want to adjust this depending on your RAM/GPU-RAM + beam size
Hi,
First of all, many thanks for making this awesome tool available! I managed to create a translation model, using the transformer_base problem, and own data. My aim is to translate a set of documents. When I apply the t2t-decoder on this set of documents, few of them will fail with OOM error. I traced this problem back to some very long sentences (long lists of comma-separated terms). When I remove these instances, the translation runs smoothly.
My question: Is there a way to tell the t2t-decoder function to not include such sentences into the decoding process, and just print them out as is in the resulting document? I had some difficulties to identify potential parameters that would allow this in the source code.
Obviously, I could remove these sentences beforehand and add them back in a later step, but it would be quite cumbersome to ensure the right position in the document.
Many thanks, Matthias