Closed alvations closed 5 years ago
After training like in https://github.com/marian-nmt/marian-examples/tree/master/training-basics-sentencepiece , the marian-decoder is throwing error when decoding:
marian-decoder
~/marian/build/marian-decoder -c model.npz.decoder.yml [2019-03-04 09:22:36] [config] alignment: 0 [2019-03-04 09:22:36] [config] allow-unk: false [2019-03-04 09:22:36] [config] beam-size: 6 [2019-03-04 09:22:36] [config] best-deep: false [2019-03-04 09:22:36] [config] clip-gemm: 0 [2019-03-04 09:22:36] [config] cpu-threads: 0 [2019-03-04 09:22:36] [config] dec-cell: gru [2019-03-04 09:22:36] [config] dec-cell-base-depth: 2 [2019-03-04 09:22:36] [config] dec-cell-high-depth: 1 [2019-03-04 09:22:36] [config] dec-depth: 6 [2019-03-04 09:22:36] [config] devices: [2019-03-04 09:22:36] [config] - 0 [2019-03-04 09:22:36] [config] dim-emb: 1024 [2019-03-04 09:22:36] [config] dim-rnn: 1024 [2019-03-04 09:22:36] [config] dim-vocabs: [2019-03-04 09:22:36] [config] - 32000 [2019-03-04 09:22:36] [config] - 32000 [2019-03-04 09:22:36] [config] enc-cell: gru [2019-03-04 09:22:36] [config] enc-cell-depth: 1 [2019-03-04 09:22:36] [config] enc-depth: 6 [2019-03-04 09:22:36] [config] enc-type: bidirectional [2019-03-04 09:22:36] [config] ignore-model-config: false [2019-03-04 09:22:36] [config] input: [2019-03-04 09:22:36] [config] - stdin [2019-03-04 09:22:36] [config] interpolate-env-vars: false [2019-03-04 09:22:36] [config] layer-normalization: false [2019-03-04 09:22:36] [config] log-level: info [2019-03-04 09:22:36] [config] max-length: 1000 [2019-03-04 09:22:36] [config] max-length-crop: false [2019-03-04 09:22:36] [config] max-length-factor: 3 [2019-03-04 09:22:36] [config] maxi-batch: 100 [2019-03-04 09:22:36] [config] maxi-batch-sort: src [2019-03-04 09:22:36] [config] mini-batch: 16 [2019-03-04 09:22:36] [config] mini-batch-words: 0 [2019-03-04 09:22:36] [config] models: [2019-03-04 09:22:36] [config] - /disk2/models/ja-en/model.npz [2019-03-04 09:22:36] [config] n-best: false [2019-03-04 09:22:36] [config] normalize: 0.6 [2019-03-04 09:22:36] [config] optimize: false [2019-03-04 09:22:36] [config] port: 8080 [2019-03-04 09:22:36] [config] quiet: false [2019-03-04 09:22:36] [config] quiet-translation: false [2019-03-04 09:22:36] [config] relative-paths: false [2019-03-04 09:22:36] [config] right-left: false [2019-03-04 09:22:36] [config] seed: 0 [2019-03-04 09:22:36] [config] skip: false [2019-03-04 09:22:36] [config] skip-cost: false [2019-03-04 09:22:36] [config] tied-embeddings: false [2019-03-04 09:22:36] [config] tied-embeddings-all: true [2019-03-04 09:22:36] [config] tied-embeddings-src: false [2019-03-04 09:22:36] [config] transformer-aan-activation: swish [2019-03-04 09:22:36] [config] transformer-aan-depth: 2 [2019-03-04 09:22:36] [config] transformer-aan-nogate: false [2019-03-04 09:22:36] [config] transformer-decoder-autoreg: self-attention [2019-03-04 09:22:36] [config] transformer-dim-aan: 2048 [2019-03-04 09:22:36] [config] transformer-dim-ffn: 4096 [2019-03-04 09:22:36] [config] transformer-ffn-activation: swish [2019-03-04 09:22:36] [config] transformer-ffn-depth: 2 [2019-03-04 09:22:36] [config] transformer-guided-alignment-layer: last [2019-03-04 09:22:36] [config] transformer-heads: 8 [2019-03-04 09:22:36] [config] transformer-no-projection: false [2019-03-04 09:22:36] [config] transformer-postprocess: da [2019-03-04 09:22:36] [config] transformer-postprocess-emb: d [2019-03-04 09:22:36] [config] transformer-preprocess: n [2019-03-04 09:22:36] [config] transformer-tied-layers: [2019-03-04 09:22:36] [config] [] [2019-03-04 09:22:36] [config] type: transformer [2019-03-04 09:22:36] [config] version: v1.7.6 9cc5b176 2018-12-14 15:11:34 -0800 [2019-03-04 09:22:36] [config] vocabs: [2019-03-04 09:22:36] [config] - /disk2/models/ja-en/vocab.src.spm [2019-03-04 09:22:36] [config] - /disk2/models/ja-en/vocab.trg.spm [2019-03-04 09:22:36] [config] word-penalty: 0 [2019-03-04 09:22:36] [config] workspace: 512 [2019-03-04 09:22:36] [config] Model created with Marian v1.7.6 9cc5b176 2018-12-14 15:11:34 -0800 [2019-03-04 09:22:36] [data] Loading vocabulary from text file /disk2/models/ja-en/vocab.src.spm [2019-03-04 09:22:36] Vocabulary file /disk2/models/ja-en/vocab.src.spm must not contain empty lines Aborted from int marian::Vocab::load(const string&, int) in /home/ltan/marian/src/marian/src/data/vocab.cpp: 117
My config file looks like this:
$ cat model.npz.decoder.yml models: - /disk2/models/ja-en/model.npz vocabs: - /disk2/models/ja-en/vocab.src.spm - /disk2/models/ja-en/vocab.trg.spm beam-size: 6 normalize: 0.6 word-penalty: 0 mini-batch: 16 maxi-batch: 100 maxi-batch-sort: src relative-paths: false
Is there a special argument that needs to be used when using sentence piece as the tokenizer when decoding?
It's strange. Somehow I recompiled the binary and it works. Although the same version of the binary was compiled, version: v1.7.6 9cc5b176. At least it works now =)
version: v1.7.6 9cc5b176
After training like in https://github.com/marian-nmt/marian-examples/tree/master/training-basics-sentencepiece , the
marian-decoder
is throwing error when decoding:My config file looks like this:
Is there a special argument that needs to be used when using sentence piece as the tokenizer when decoding?