Open rllin-fathom opened 5 years ago
Really sorry for the possible inconvenience caused. Yes, it wasn't ideal! A lot of people expected using universal_transformer_base
to reproduce the MT results reported in the paper and for that transformer_ffn_type
needs to be the fn
. We got a lot of questions/issues and I decided to change it.
Also curious to know for what task/dataset you are getting better results by using sep_conv as transition function instead of a feed-forward layer? :)
@MostafaDehghani an internal NLP classification task--for whatever reason, seems to make a substantial difference (~2% F1, iirc--been a little while since we ran the less-good variant).
This change happened in https://github.com/tensorflow/tensor2tensor/commit/f7695e8d5c9d403fed763c0cc351c6835516e6f3#diff-750434e84a49123af08c4226eb4962f7R399
It seems less than ideal to switch defaults like this. In practice this has caused suboptimal model performance for us, and took some experimentation to pinpoint. We've switched it back in our fork of tensor2tensor.
The larger question is what are the guidelines around if base hparams can just be changed upstream in tensor2tensor?