tensorflow / tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Apache License 2.0
15.5k stars 3.49k forks source link

UT hparams default `transformer_ffn_type` changed from "sepconv" to "fn" #1204

Open rllin-fathom opened 5 years ago

rllin-fathom commented 5 years ago

This change happened in https://github.com/tensorflow/tensor2tensor/commit/f7695e8d5c9d403fed763c0cc351c6835516e6f3#diff-750434e84a49123af08c4226eb4962f7R399

It seems less than ideal to switch defaults like this. In practice this has caused suboptimal model performance for us, and took some experimentation to pinpoint. We've switched it back in our fork of tensor2tensor.

The larger question is what are the guidelines around if base hparams can just be changed upstream in tensor2tensor?

MostafaDehghani commented 5 years ago

Really sorry for the possible inconvenience caused. Yes, it wasn't ideal! A lot of people expected using universal_transformer_base to reproduce the MT results reported in the paper and for that transformer_ffn_type needs to be the fn. We got a lot of questions/issues and I decided to change it.

MostafaDehghani commented 5 years ago

Also curious to know for what task/dataset you are getting better results by using sep_conv as transition function instead of a feed-forward layer? :)

cbockman commented 5 years ago

@MostafaDehghani an internal NLP classification task--for whatever reason, seems to make a substantial difference (~2% F1, iirc--been a little while since we ran the less-good variant).