Open cbockman opened 5 years ago
Paging the helpful @MostafaDehghani :)
Hi @afrozenator, Hi @cbockman :) Sorry for my late reaction to this!
I actually never tried UT on TPU. I talked to @lukaszkaiser about this and there were 2 problems:
For one, TF does not pass correctly maximum_iterations in foldl
and Lukasz sent a CL to correct that, it's actually needed for any foldl
on TPU. The good news is that this change is in: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/functional_ops.py#L147
After that, there's another problem: LA compilation requires that operator argument that represents shapes or dimensions be evaluated to concrete values at compile time. Currently, we have add_step_timing_signal
that gets step
which is non-static. To avoid this, it's pretty simple to change the UT base model (e.g. replacing the foldl
with a for loop with shared parameters), but should be more difficult for the UT with ACT. To run it on TPUs for now, we can just disable step-embedding, so with tf-nightly, you can use the TPU config that @lukaszkaiser has added here https://github.com/tensorflow/tensor2tensor/commit/7a2f3114a60a82a5f97e6a2660d9510689d2f061.
Thank you @MostafaDehghani ! I figured there was something related to static steps...if it wasn't just us doing something dumb on our end.
Re:disabling step embedding, should we expect that to have significant performance impact? Or is this an unknown?
Just trying to forecast performance, since when things drop, it can be very murky to figure out if we did something wrong, or if it is just inherent to the problem/data/model/...
Description
Should Universal Transformer work with TPU? Tried a spin at getting it to work and isn't.
Model + hparams below.
I do see that there are _tpu specific hparam sets for Transformer, and none for UT, which might be a sign that things are not functional on TPU; OTOH I see that greedy_infer (https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/research/universal_transformer.py#L223) does notionally have some TPU support.
Offhand, doesn't look like any of the transformer hparam changes should be required for running (https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/transformer.py#L1991) (at least not for the errors I see below).
No problem obviously if not supported, would like to know, however, if we're doing something wrong here.
NOTE: was able to get other models (e.g., base transformer) to apparently run fine on TPUs, and UT works great on GPU.
Thanks!
Environment information
tf 1.12 latest pypi t2t (1.11) TPUv2 on GKE
For bugs: reproduction and error logs
Error logs: