marian-nmt / marian

Fast Neural Machine Translation in C++
https://marian-nmt.github.io
Other
1.25k stars 233 forks source link

marian-decoder failed to load multi-layer, bi-deep Nematus model #138

Closed pprzyby2 closed 6 years ago

pprzyby2 commented 6 years ago

We tried running Marian decoder for bi-deep models trained with Nematus with following command:

./../../../marian-dev/build/marian-decoder \
    --type nematus \
    --models $preprocess/model.npz \
    --vocabs $preprocess/src.corpus.en-de.tc.nochars.bpe.en.json $preprocess/trg.corpus.en-de.tc.nochars.bpe.de.json \
    --mini-batch 128 --maxi-batch-sort src --maxi-batch 100 \
    --allow-unk false \
    -b 12 \
    -n \
    -d 2 \
    -w 2500 

For one layer models Marian worked fine but for multi-layer model we had an error:

[2017-11-28 16:54:54] Loading scorer of type nematus as feature F0
[2017-11-28 16:54:54] Loading model nematus from /home/svc_dt_user/wmt17_systems/en-de/p/model.npz
[2017-11-28 16:55:09] Graph was reloaded and parameter 'F0::encoder_bi_l2_cell1_U' is newly created
Aborted from marian::Expr marian::ExpressionGraph::param(std::__cxx11::string, marian::Shape, Args ...) [with Args = {marian::keywords::Keyword<3329549428u, std::function<void(std::shared_ptr<marian::TensorBase>)> >}; marian::Expr = std::shared_ptr<marian::Chainable<std::shared_ptr<marian::TensorBase> > >; std::__cxx11::string = std::__cxx11::basic_string<char>] in /home/svc_dt_user/marian-dev/src/graph/expression_graph.h: 284

Here are some features of the model from model.npz.json file:

  "anneal_decay": 0.5, 
  "anneal_restarts": 0, 
  "batch_size": 40, 
  "clip_c": 1, 
  "dec_base_recurrence_transition_depth": 4, 
  "dec_deep_context": true, 
  "dec_depth": 4, 
  "dec_high_recurrence_transition_depth": 2, 
  "decay_c": 0, 
  "decoder": "gru_cond", 
  "decoder_deep": "gru", 
  "decoder_truncate_gradient": -1, 
  "enc_depth": 4, 
  "enc_depth_bidirectional": 4, 
  "enc_recurrence_transition_depth": 2, 
  "encoder": "gru", 
  "encoder_truncate_gradient": -1,   
  "dim": 1024, 
  "dim_per_factor": [
    512
  ], 
  "dim_word": 512, 
  "dispFreq": 1000, 
  "domain_interpolation_inc": 0.1, 
  "domain_interpolation_indomain_datasets": null, 
  "domain_interpolation_max": 1.0, 
  "domain_interpolation_min": 0.1, 
  "dropout_embedding": 0.2, 
  "dropout_hidden": 0.2, 
  "dropout_source": 0, 
  "dropout_target": 0, 
  "dynamic_batching": true, 
  "factors": 1, 
  "finish_after": 10000000, 
  "grad_average": false, 
  "layer_normalisation": true, 
  "lrate": 0.0001, 
  "map_decay_c": 0, 
  "max_epochs": 5000, 
  "maxibatch_size": 20, 
  "maxlen": 50, 
  "model_version": 0.1, 

Actual file names from model.npz:

decoder_2_b.npy
decoder_2_bx.npy
decoder_2_bx_drt_1.npy
decoder_2_b_drt_1.npy
decoder_2_U.npy
decoder_2_Ux.npy
decoder_2_Ux_drt_1.npy
decoder_2_Ux_drt_1_lnb.npy
decoder_2_Ux_drt_1_lns.npy
decoder_2_Ux_lnb.npy
decoder_2_Ux_lns.npy
decoder_2_U_drt_1.npy
decoder_2_U_drt_1_lnb.npy
decoder_2_U_drt_1_lns.npy
decoder_2_U_lnb.npy
decoder_2_U_lns.npy
decoder_2_W.npy
decoder_2_Wx.npy
decoder_2_Wx_lnb.npy
decoder_2_Wx_lns.npy
decoder_2_W_lnb.npy
decoder_2_W_lns.npy
decoder_3_b.npy
decoder_3_bx.npy
decoder_3_bx_drt_1.npy
decoder_3_b_drt_1.npy
decoder_3_U.npy
decoder_3_Ux.npy
decoder_3_Ux_drt_1.npy
decoder_3_Ux_drt_1_lnb.npy
decoder_3_Ux_drt_1_lns.npy
decoder_3_Ux_lnb.npy
decoder_3_Ux_lns.npy
decoder_3_U_drt_1.npy
decoder_3_U_drt_1_lnb.npy
decoder_3_U_drt_1_lns.npy
decoder_3_U_lnb.npy
decoder_3_U_lns.npy
decoder_3_W.npy
decoder_3_Wx.npy
decoder_3_Wx_lnb.npy
decoder_3_Wx_lns.npy
decoder_3_W_lnb.npy
decoder_3_W_lns.npy
decoder_4_b.npy
decoder_4_bx.npy
decoder_4_bx_drt_1.npy
decoder_4_b_drt_1.npy
decoder_4_U.npy
decoder_4_Ux.npy
decoder_4_Ux_drt_1.npy
decoder_4_Ux_drt_1_lnb.npy
decoder_4_Ux_drt_1_lns.npy
decoder_4_Ux_lnb.npy
decoder_4_Ux_lns.npy
decoder_4_U_drt_1.npy
decoder_4_U_drt_1_lnb.npy
decoder_4_U_drt_1_lns.npy
decoder_4_U_lnb.npy
decoder_4_U_lns.npy
decoder_4_W.npy
decoder_4_Wx.npy
decoder_4_Wx_lnb.npy
decoder_4_Wx_lns.npy
decoder_4_W_lnb.npy
decoder_4_W_lns.npy
decoder_b.npy
decoder_bx.npy
decoder_bx_nl.npy
decoder_bx_nl_drt_1.npy
decoder_bx_nl_drt_2.npy
decoder_b_att.npy
decoder_b_nl.npy
decoder_b_nl_drt_1.npy
decoder_b_nl_drt_2.npy
decoder_c_tt.npy
decoder_U.npy
decoder_Ux.npy
decoder_Ux_lnb.npy
decoder_Ux_lns.npy
decoder_Ux_nl.npy
decoder_Ux_nl_drt_1.npy
decoder_Ux_nl_drt_1_lnb.npy
decoder_Ux_nl_drt_1_lns.npy
decoder_Ux_nl_drt_2.npy
decoder_Ux_nl_drt_2_lnb.npy
decoder_Ux_nl_drt_2_lns.npy
decoder_Ux_nl_lnb.npy
decoder_Ux_nl_lns.npy
decoder_U_att.npy
decoder_U_lnb.npy
decoder_U_lns.npy
decoder_U_nl.npy
decoder_U_nl_drt_1.npy
decoder_U_nl_drt_1_lnb.npy
decoder_U_nl_drt_1_lns.npy
decoder_U_nl_drt_2.npy
decoder_U_nl_drt_2_lnb.npy
decoder_U_nl_drt_2_lns.npy
decoder_U_nl_lnb.npy
decoder_U_nl_lns.npy
decoder_W.npy
decoder_Wc.npy
decoder_Wcx.npy
decoder_Wcx_lnb.npy
decoder_Wcx_lns.npy
decoder_Wc_att.npy
decoder_Wc_att_lnb.npy
decoder_Wc_att_lns.npy
decoder_Wc_lnb.npy
decoder_Wc_lns.npy
decoder_Wx.npy
decoder_Wx_lnb.npy
decoder_Wx_lns.npy
decoder_W_comb_att.npy
decoder_W_comb_att_lnb.npy
decoder_W_comb_att_lns.npy
decoder_W_lnb.npy
decoder_W_lns.npy
encoder_2_b.npy
encoder_2_bx.npy
encoder_2_bx_drt_1.npy
encoder_2_b_drt_1.npy
encoder_2_U.npy
encoder_2_Ux.npy
encoder_2_Ux_drt_1.npy
encoder_2_Ux_drt_1_lnb.npy
encoder_2_Ux_drt_1_lns.npy
encoder_2_Ux_lnb.npy
encoder_2_Ux_lns.npy
encoder_2_U_drt_1.npy
encoder_2_U_drt_1_lnb.npy
encoder_2_U_drt_1_lns.npy
encoder_2_U_lnb.npy
encoder_2_U_lns.npy
encoder_2_W.npy
encoder_2_Wx.npy
encoder_2_Wx_lnb.npy
encoder_2_Wx_lns.npy
encoder_2_W_lnb.npy
encoder_2_W_lns.npy
encoder_3_b.npy
encoder_3_bx.npy
encoder_3_bx_drt_1.npy
encoder_3_b_drt_1.npy
encoder_3_U.npy
encoder_3_Ux.npy
encoder_3_Ux_drt_1.npy
encoder_3_Ux_drt_1_lnb.npy
encoder_3_Ux_drt_1_lns.npy
encoder_3_Ux_lnb.npy
encoder_3_Ux_lns.npy
encoder_3_U_drt_1.npy
encoder_3_U_drt_1_lnb.npy
encoder_3_U_drt_1_lns.npy
encoder_3_U_lnb.npy
encoder_3_U_lns.npy
encoder_3_W.npy
encoder_3_Wx.npy
encoder_3_Wx_lnb.npy
encoder_3_Wx_lns.npy
encoder_3_W_lnb.npy
encoder_3_W_lns.npy
encoder_4_b.npy
encoder_4_bx.npy
encoder_4_bx_drt_1.npy
encoder_4_b_drt_1.npy
encoder_4_U.npy
encoder_4_Ux.npy
encoder_4_Ux_drt_1.npy
encoder_4_Ux_drt_1_lnb.npy
encoder_4_Ux_drt_1_lns.npy
encoder_4_Ux_lnb.npy
encoder_4_Ux_lns.npy
encoder_4_U_drt_1.npy
encoder_4_U_drt_1_lnb.npy
encoder_4_U_drt_1_lns.npy
encoder_4_U_lnb.npy
encoder_4_U_lns.npy
encoder_4_W.npy
encoder_4_Wx.npy
encoder_4_Wx_lnb.npy
encoder_4_Wx_lns.npy
encoder_4_W_lnb.npy
encoder_4_W_lns.npy
encoder_b.npy
encoder_bx.npy
encoder_bx_drt_1.npy
encoder_b_drt_1.npy
encoder_r_2_b.npy
encoder_r_2_bx.npy
encoder_r_2_bx_drt_1.npy
encoder_r_2_b_drt_1.npy
encoder_r_2_U.npy
encoder_r_2_Ux.npy
encoder_r_2_Ux_drt_1.npy
encoder_r_2_Ux_drt_1_lnb.npy
encoder_r_2_Ux_drt_1_lns.npy
encoder_r_2_Ux_lnb.npy
encoder_r_2_Ux_lns.npy
encoder_r_2_U_drt_1.npy
encoder_r_2_U_drt_1_lnb.npy
encoder_r_2_U_drt_1_lns.npy
encoder_r_2_U_lnb.npy
encoder_r_2_U_lns.npy
encoder_r_2_W.npy
encoder_r_2_Wx.npy
encoder_r_2_Wx_lnb.npy
encoder_r_2_Wx_lns.npy
encoder_r_2_W_lnb.npy
encoder_r_2_W_lns.npy
encoder_r_3_b.npy
encoder_r_3_bx.npy
encoder_r_3_bx_drt_1.npy
encoder_r_3_b_drt_1.npy
encoder_r_3_U.npy
encoder_r_3_Ux.npy
encoder_r_3_Ux_drt_1.npy
encoder_r_3_Ux_drt_1_lnb.npy
encoder_r_3_Ux_drt_1_lns.npy
encoder_r_3_Ux_lnb.npy
encoder_r_3_Ux_lns.npy
encoder_r_3_U_drt_1.npy
encoder_r_3_U_drt_1_lnb.npy
encoder_r_3_U_drt_1_lns.npy
encoder_r_3_U_lnb.npy
encoder_r_3_U_lns.npy
encoder_r_3_W.npy
encoder_r_3_Wx.npy
encoder_r_3_Wx_lnb.npy
encoder_r_3_Wx_lns.npy
encoder_r_3_W_lnb.npy
encoder_r_3_W_lns.npy
encoder_r_4_b.npy
encoder_r_4_bx.npy
encoder_r_4_bx_drt_1.npy
encoder_r_4_b_drt_1.npy
encoder_r_4_U.npy
encoder_r_4_Ux.npy
encoder_r_4_Ux_drt_1.npy
encoder_r_4_Ux_drt_1_lnb.npy
encoder_r_4_Ux_drt_1_lns.npy
encoder_r_4_Ux_lnb.npy
encoder_r_4_Ux_lns.npy
encoder_r_4_U_drt_1.npy
encoder_r_4_U_drt_1_lnb.npy
encoder_r_4_U_drt_1_lns.npy
encoder_r_4_U_lnb.npy
encoder_r_4_U_lns.npy
encoder_r_4_W.npy
encoder_r_4_Wx.npy
encoder_r_4_Wx_lnb.npy
encoder_r_4_Wx_lns.npy
encoder_r_4_W_lnb.npy
encoder_r_4_W_lns.npy
encoder_r_b.npy
encoder_r_bx.npy
encoder_r_bx_drt_1.npy
encoder_r_b_drt_1.npy
encoder_r_U.npy
encoder_r_Ux.npy
encoder_r_Ux_drt_1.npy
encoder_r_Ux_drt_1_lnb.npy
encoder_r_Ux_drt_1_lns.npy
encoder_r_Ux_lnb.npy
encoder_r_Ux_lns.npy
encoder_r_U_drt_1.npy
encoder_r_U_drt_1_lnb.npy
encoder_r_U_drt_1_lns.npy
encoder_r_U_lnb.npy
encoder_r_U_lns.npy
encoder_r_W.npy
encoder_r_Wx.npy
encoder_r_Wx_lnb.npy
encoder_r_Wx_lns.npy
encoder_r_W_lnb.npy
encoder_r_W_lns.npy
encoder_U.npy
encoder_Ux.npy
encoder_Ux_drt_1.npy
encoder_Ux_drt_1_lnb.npy
encoder_Ux_drt_1_lns.npy
encoder_Ux_lnb.npy
encoder_Ux_lns.npy
encoder_U_drt_1.npy
encoder_U_drt_1_lnb.npy
encoder_U_drt_1_lns.npy
encoder_U_lnb.npy
encoder_U_lns.npy
encoder_W.npy
encoder_Wx.npy
encoder_Wx_lnb.npy
encoder_Wx_lns.npy
encoder_W_lnb.npy
encoder_W_lns.npy
ff_logit_b.npy
ff_logit_ctx_b.npy
ff_logit_ctx_ln_b.npy
ff_logit_ctx_ln_s.npy
ff_logit_ctx_W.npy
ff_logit_lstm_b.npy
ff_logit_lstm_ln_b.npy
ff_logit_lstm_ln_s.npy
ff_logit_lstm_W.npy
ff_logit_prev_b.npy
ff_logit_prev_ln_b.npy
ff_logit_prev_ln_s.npy
ff_logit_prev_W.npy
ff_state_b.npy
ff_state_ln_b.npy
ff_state_ln_s.npy
ff_state_W.npy
special:model.yml.npy
Wemb.npy
Wemb_dec.npy
emjotde commented 6 years ago

Is this as model that has been trained with Nematus? You would need to provide the network structure on the command line. Although I am confused as I see there is a "special:model.yml.npy" in the model. Did you try to convert the model with a script?

emjotde commented 6 years ago

So, this would be the options that marian support:

--dec-cell-base-depth 4 --dec-cell-high-depth 2 --dec-depth 4 --enc-depth 4 --enc-cell-depth 2

Not sure about these two:

  "dec_deep_context": true, 
  "enc_depth_bidirectional": 4, 

I do not think we have that at the moment. Any idea what they do?

Also any particular reason for this architecture? It does not correspond to any of the recent Edinburgh WMT papers.

chochowski commented 6 years ago

The model was trained with Nematus. The architecture is as follows: --dim_word 512 \ --dim 1024 \ --tie_decoder_embeddings \ --layer_normalisation \ --enc_depth 4 \ --dec_depth 4 \ --dec_deep_context \ --enc_recurrence_transition_depth 2 \ --dec_base_recurrence_transition_depth 4 \ --dec_high_recurrence_transition_depth 2 \

With a quick view at the marian code I didn't noticed labels fgenerator or encoder decoder layers that in Nematus are named encoderN... I'll be digging further.

emjotde commented 6 years ago

I am not sure about --dec_deep_context , does that put attention mechanisms into each decoder layer? If yes, we do not have that.

emjotde commented 6 years ago

apart from the options above you would need to add

--layer-normalization --tied-embeddings

So the complete set of supported options would be:

--ignore-model-config
--layer-normalization
--tied-embeddings
--dec-cell-base-depth 4 
--dec-cell-high-depth 2 
--dec-depth 4 
--enc-depth 4 
--enc-cell-depth 2

The option --ignore-model-config is a bit risky, as it will fill missing parameters with random weights, but it will at least ignore a potentially incorrect config that was inserted into the npz file.

chochowski commented 6 years ago

--dec_deep_context makes the context to be concatenated with the input of successive layers in the decoder. Still, as I'mreading the nematus.h where the mapping is generated e.g for encoder lines 163-172, I can only see the looping over enc-cell-depth. What about enc-depth - it looks like there will be no mapping for encoder_2_b.npy or am I missing something ?

emjotde commented 6 years ago

OK, we do not have that. This is a repetition of the aligned context, I suppose, from the first attention mechanism. So this model is currently not compatible with Marian.

emjotde commented 6 years ago

If you can make that model available together with some testing data we can take a look. No guarantee about the time frame though.

emjotde commented 6 years ago

Closing this for now.