How to train a transformer model with multi-source encoders ?

ufal / neuralmonkey

An open-source tool for sequence learning in NLP built on TensorFlow.

BSD 3-Clause "New" or "Revised" License

412 stars 103 forks source link

How to train a transformer model with multi-source encoders ? #822

Open penny9287 opened 5 years ago

penny9287 commented 5 years ago

I wonder how to modify the configuration file to train a multi-source based transformer model with different attention types.

jindrahelcl commented 5 years ago

Hi, currently, the Transformer decoder only supports the multi-head scaled dot-product attention from the "Attention is All You Need" paper. If you provide multiple encoders, you can choose which attention combination strategy you want to use, one of serial, parallel, hierarchical, and flat.

wyjllm commented 5 years ago

I wonder how to specify the combination strategy for multiple encoders in the configuration file, have any examples?

jindrahelcl commented 5 years ago

just specify the attention_combination_strategy parameter in the transformer decoder configuration. It can be one of serial, parallel, hierarchical, and flat.