marian-server doesn't support Average Attention Network for transformer.

beichao1314 commented 6 years ago

Hi, I'm testing the speed of Average Attention Network for transformer in online system using marian-server, but marian-server doesn't support Average Attention Network for transformer. It report that:

[2018-08-06 11:43:17] [config] alignment: false
[2018-08-06 11:43:17] [config] allow-unk: false
[2018-08-06 11:43:17] [config] beam-size: 12
[2018-08-06 11:43:17] [config] best-deep: false
[2018-08-06 11:43:17] [config] clip-gemm: 0
[2018-08-06 11:43:17] [config] cpu-threads: 0
[2018-08-06 11:43:17] [config] dec-cell: gru
[2018-08-06 11:43:17] [config] dec-cell-base-depth: 2
[2018-08-06 11:43:17] [config] dec-cell-high-depth: 1
[2018-08-06 11:43:17] [config] dec-depth: 6
[2018-08-06 11:43:17] [config] devices:
[2018-08-06 11:43:17] [config]   - 0
[2018-08-06 11:43:17] [config]   - 1
[2018-08-06 11:43:17] [config]   - 2
[2018-08-06 11:43:17] [config]   - 3
[2018-08-06 11:43:17] [config] dim-emb: 512
[2018-08-06 11:43:17] [config] dim-rnn: 1024
[2018-08-06 11:43:17] [config] dim-vocabs:
[2018-08-06 11:43:17] [config]   - 60000
[2018-08-06 11:43:17] [config]   - 50500
[2018-08-06 11:43:17] [config] enc-cell: gru
[2018-08-06 11:43:17] [config] enc-cell-depth: 1
[2018-08-06 11:43:17] [config] enc-depth: 6
[2018-08-06 11:43:17] [config] enc-type: bidirectional
[2018-08-06 11:43:17] [config] ignore-model-config: false
[2018-08-06 11:43:17] [config] input:
[2018-08-06 11:43:17] [config]   - stdin
[2018-08-06 11:43:17] [config] interpolate-env-vars: false
[2018-08-06 11:43:17] [config] layer-normalization: false
[2018-08-06 11:43:17] [config] log-level: info
[2018-08-06 11:43:17] [config] max-length: 1000
[2018-08-06 11:43:17] [config] max-length-crop: false
[2018-08-06 11:43:17] [config] max-length-factor: 3
[2018-08-06 11:43:17] [config] maxi-batch: 64
[2018-08-06 11:43:17] [config] maxi-batch-sort: src
[2018-08-06 11:43:17] [config] mini-batch: 32
[2018-08-06 11:43:17] [config] mini-batch-words: 0
[2018-08-06 11:43:17] [config] models:
[2018-08-06 11:43:17] [config]   - /home/beichao/train/enzh_3k5_0306/aan/train_zhen/model/model.npz
[2018-08-06 11:43:17] [config] n-best: false
[2018-08-06 11:43:17] [config] normalize: 1
[2018-08-06 11:43:17] [config] optimize: false
[2018-08-06 11:43:17] [config] port: 8080
[2018-08-06 11:43:17] [config] quiet: false
[2018-08-06 11:43:17] [config] quiet-translation: false
[2018-08-06 11:43:17] [config] relative-paths: true
[2018-08-06 11:43:17] [config] right-left: false
[2018-08-06 11:43:17] [config] seed: 0
[2018-08-06 11:43:17] [config] skip: false
[2018-08-06 11:43:17] [config] skip-cost: false
[2018-08-06 11:43:17] [config] tied-embeddings: false
[2018-08-06 11:43:17] [config] tied-embeddings-all: false
[2018-08-06 11:43:17] [config] tied-embeddings-src: false
[2018-08-06 11:43:17] [config] transformer-aan-activation: swish
[2018-08-06 11:43:17] [config] transformer-aan-depth: 2
[2018-08-06 11:43:17] [config] transformer-aan-nogate: false
[2018-08-06 11:43:17] [config] transformer-decoder-autoreg: average-attention
[2018-08-06 11:43:17] [config] transformer-dim-aan: 2048
[2018-08-06 11:43:17] [config] transformer-dim-ffn: 2048
[2018-08-06 11:43:17] [config] transformer-ffn-activation: swish
[2018-08-06 11:43:17] [config] transformer-ffn-depth: 2
[2018-08-06 11:43:17] [config] transformer-heads: 8
[2018-08-06 11:43:17] [config] transformer-no-projection: false
[2018-08-06 11:43:17] [config] transformer-postprocess: dan
[2018-08-06 11:43:17] [config] transformer-postprocess-emb: d
[2018-08-06 11:43:17] [config] transformer-preprocess: ""
[2018-08-06 11:43:17] [config] type: transformer
[2018-08-06 11:43:17] [config] version: v1.5.0+8b0e2f9
[2018-08-06 11:43:17] [config] vocabs:
[2018-08-06 11:43:17] [config]   - /home/beichao/train/enzh_3k5_0306/aan/train_zhen/data/train.bpe.zh.yml
[2018-08-06 11:43:17] [config]   - /home/beichao/train/enzh_3k5_0306/aan/train_zhen/data/train.bpe.en.yml
[2018-08-06 11:43:17] [config] word-penalty: 0
[2018-08-06 11:43:17] [config] workspace: 512
[2018-08-06 11:43:17] [config] Model created with Marian v1.5.0+8b0e2f9
[2018-08-06 11:43:17] [data] Loading vocabulary from /home/beichao/train/enzh_3k5_0306/aan/train_zhen/data/train.bpe.zh.yml
[2018-08-06 11:43:18] [data] Loading vocabulary from /home/beichao/train/enzh_3k5_0306/aan/train_zhen/data/train.bpe.en.yml
[2018-08-06 11:43:19] [memory] Extending reserved space to 512 MB (device gpu0)
[2018-08-06 11:43:19] Loading scorer of type transformer as feature F0
[2018-08-06 11:43:19] Loading model from /home/beichao/train/enzh_3k5_0306/aan/train_zhen/model/model.npz
[2018-08-06 11:43:21] [memory] Extending reserved space to 512 MB (device gpu1)
[2018-08-06 11:43:21] Loading scorer of type transformer as feature F0
[2018-08-06 11:43:21] Loading model from /home/beichao/train/enzh_3k5_0306/aan/train_zhen/model/model.npz
[2018-08-06 11:43:22] [memory] Extending reserved space to 512 MB (device gpu2)
[2018-08-06 11:43:22] Loading scorer of type transformer as feature F0
[2018-08-06 11:43:22] Loading model from /home/beichao/train/enzh_3k5_0306/aan/train_zhen/model/model.npz
[2018-08-06 11:43:24] [memory] Extending reserved space to 512 MB (device gpu3)
[2018-08-06 11:43:24] Loading scorer of type transformer as feature F0
[2018-08-06 11:43:24] Loading model from /home/beichao/train/enzh_3k5_0306/aan/train_zhen/model/model.npz
[2018-08-06 11:43:25] Server is listening on port 8080
[2018-08-06 11:43:40] Message received: in fact , Amazon pays the same lower rate that the post office charges other bulk ship@@ pers , and it collects sales tax in every state that charges it .
[2018-08-06 11:43:40] Graph was reloaded and parameter 'F0::decoder_l1_self_Wq' is newly created
Aborted from marian::Expr marian::ExpressionGraph::param(const string&, const marian::Shape&, const NodeInitializer&, bool) in /home/beichao/train/marian/src/marian/src/graph/expression_graph.h: 230
Aborted (core dumped)

And then, I check the configuration in marian-server, it also doesn't support the Average Attention Network.

emjotde commented 6 years ago

Hi, what's the output of the following command?

./marian-server --version

beichao1314 commented 6 years ago

I clone the latest repository, it works well. But updating the old repository through git pull origin master doesn't get the right version. ./marian --version outputs v1.4.0+3873cd4. Another, ./marian-server --version doesn't output the version of marian-server in the latest version.

emjotde commented 6 years ago

How about the following in the old repositories where it did not update?

git pull
git submodule update --recursive --remote

emjotde commented 6 years ago

Hm, you are right, somehow the current binaries in the marian repo are not giving version numbers. I will update that tomorrow.

You can always try the most recent version from http://github.com/marian-nmt/marian-dev. This will likely have small fixes earlier than the main repo. Versions work there.

beichao1314 commented 6 years ago

Thanks for your great work.

marian-nmt / marian

marian-server doesn't support Average Attention Network for transformer. #201