nyu-mll / GLUE-baselines

[DEPRECATED] Repo for exploring multi-task learning approaches to learning sentence representations
https://gluebenchmark.com
739 stars 164 forks source link

Unable to run by Elmo Embedding #4

Open YuehWu1994 opened 5 years ago

YuehWu1994 commented 5 years ago

Hello, I am unable to use the ELMo implementation even though I follow the arguments provided at README. I use Python 3.6.2 (Anaconda) and install AllenNLP on virtual environment

Here are the relative arguments I use:

GPUID=-1
SEED=19

SHOULD_TRAIN=1
WORD_EMBS_FILE="../glove/glove.6B/glove.6B.50d.txt"

d_word=50
d_hid=512
glove=0
ELMO=1
deep_elmo=0
elmo_no_glove=1
COVE=0

PAIR_ENC="simple"

Here is my error log:

(allennlp) ➜   bash run_stuff.sh
12/01 04:00:19 PM: Namespace(batch_size=64, bpp_base=10, bpp_method='percent_tr', classifier='mlp', classifier_dropout=0.0, classifier_hid_dim=512, cove=0, cuda=-1, d_hid=512, d_word=50, deep_elmo=0, dropout=0.2, dropout_embs=0.2, elmo=1, elmo_no_glove=1, eval_tasks='none', exp_dir='EXP_DIR', glove=0, load_epoch=-1, load_model=0, load_preproc=1, load_tasks=1, log_file='log.log', lr=0.1, lr_decay_factor=0.5, max_grad_norm=5.0, max_seq_len=40, max_vals=100, max_word_v_size=30000, min_lr=1e-05, n_epochs=10, n_layers_enc=1, n_layers_highway=0, no_tqdm=0, optimizer='sgd', pair_enc='simple', patience=5, preproc_file='preproc.pkl', random_seed=19, run_dir='RUN_DIR', scaling_method='none', scheduler_threshold=0.0, shared_optimizer=1, should_train=1, task_ordering='random', task_patience=0, train_tasks='cola', train_words=0, trainer_type='sampling', val_interval=10, weight_decay=0.0, weighting_method='uniform', word_embs_file='../glove/glove.6B/glove.6B.50d.txt')
12/01 04:00:19 PM: Using random seed 19
12/01 04:00:19 PM: Loading tasks...
12/01 04:00:19 PM:  Loaded existing task cola
12/01 04:00:19 PM:  Loaded existing task sst
12/01 04:00:19 PM:  Loaded existing task mrpc
12/01 04:00:19 PM:  Finished loading tasks: cola sst mrpc.
12/01 04:00:22 PM: Loading token dictionary from EXP_DIR/vocab.
12/01 04:00:22 PM:  Finished building vocab. Using 30002 words
12/01 04:00:22 PM:  Loaded data from EXP_DIR/preproc.pkl
12/01 04:00:22 PM:    Training on cola, sst, mrpc
12/01 04:00:22 PM:    Evaluating on 
12/01 04:00:22 PM:  Finished loading tasks in 3.215s
12/01 04:00:22 PM: Building model...
12/01 04:00:22 PM:  Learning embeddings from scratch!
12/01 04:00:22 PM:  Using ELMo embeddings!
12/01 04:00:22 PM:  NOT using GLoVe embeddings!
12/01 04:00:22 PM: Initializing ELMo
12/01 04:00:43 PM: instantiating registered subclass lstm of <class 'allennlp.modules.seq2seq_encoders.seq2seq_encoder.Seq2SeqEncoder'>
12/01 04:00:43 PM: batch_first = True
12/01 04:00:43 PM: stateful = False
12/01 04:00:43 PM: Converting Params object to dict; logging of default values will not occur when dictionary parameters are used subsequently.
12/01 04:00:43 PM: CURRENTLY DEFINED PARAMETERS: 
12/01 04:00:43 PM: input_size = 1024
12/01 04:00:43 PM: hidden_size = 512
12/01 04:00:43 PM: num_layers = 1
12/01 04:00:43 PM: bidirectional = True
12/01 04:00:43 PM: batch_first = True
12/01 04:00:43 PM: Initializing parameters
12/01 04:00:43 PM: Done initializing parameters; the following parameters are using their default initialization from their code
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_0.input_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_0.state_linearity.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_0.state_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_0.state_projection.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_1.input_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_1.state_linearity.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_1.state_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_1.state_projection.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_0.input_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_0.state_linearity.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_0.state_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_0.state_projection.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_1.input_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_1.state_linearity.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_1.state_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_1.state_projection.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._char_embedding_weights
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._highways._layers.0.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._highways._layers.0.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._highways._layers.1.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._highways._layers.1.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._projection.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._projection.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_0.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_0.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_1.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_1.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_2.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_2.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_3.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_3.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_4.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_4.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_5.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_5.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_6.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_6.weight
12/01 04:00:43 PM:    _elmo.scalar_mix_0.gamma
12/01 04:00:43 PM:    _elmo.scalar_mix_0.scalar_parameters.0
12/01 04:00:43 PM:    _elmo.scalar_mix_0.scalar_parameters.1
12/01 04:00:43 PM:    _elmo.scalar_mix_0.scalar_parameters.2
12/01 04:00:43 PM:    _phrase_layer._module.bias_hh_l0
12/01 04:00:43 PM:    _phrase_layer._module.bias_hh_l0_reverse
12/01 04:00:43 PM:    _phrase_layer._module.bias_ih_l0
12/01 04:00:43 PM:    _phrase_layer._module.bias_ih_l0_reverse
12/01 04:00:43 PM:    _phrase_layer._module.weight_hh_l0
12/01 04:00:43 PM:    _phrase_layer._module.weight_hh_l0_reverse
12/01 04:00:43 PM:    _phrase_layer._module.weight_ih_l0
12/01 04:00:43 PM:    _phrase_layer._module.weight_ih_l0_reverse
12/01 04:00:43 PM: Initializing parameters
12/01 04:00:43 PM: Done initializing parameters; the following parameters are using their default initialization from their code
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_0.input_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_0.state_linearity.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_0.state_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_0.state_projection.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_1.input_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_1.state_linearity.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_1.state_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_1.state_projection.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_0.input_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_0.state_linearity.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_0.state_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_0.state_projection.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_1.input_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_1.state_linearity.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_1.state_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_1.state_projection.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._char_embedding_weights
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._highways._layers.0.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._highways._layers.0.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._highways._layers.1.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._highways._layers.1.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._projection.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._projection.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_0.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_0.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_1.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_1.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_2.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_2.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_3.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_3.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_4.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_4.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_5.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_5.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_6.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_6.weight
12/01 04:00:43 PM:    _elmo.scalar_mix_0.gamma
12/01 04:00:43 PM:    _elmo.scalar_mix_0.scalar_parameters.0
12/01 04:00:43 PM:    _elmo.scalar_mix_0.scalar_parameters.1
12/01 04:00:43 PM:    _elmo.scalar_mix_0.scalar_parameters.2
12/01 04:00:43 PM:    _phrase_layer._module.bias_hh_l0
12/01 04:00:43 PM:    _phrase_layer._module.bias_hh_l0_reverse
12/01 04:00:43 PM:    _phrase_layer._module.bias_ih_l0
12/01 04:00:43 PM:    _phrase_layer._module.bias_ih_l0_reverse
12/01 04:00:43 PM:    _phrase_layer._module.weight_hh_l0
12/01 04:00:43 PM:    _phrase_layer._module.weight_hh_l0_reverse
12/01 04:00:43 PM:    _phrase_layer._module.weight_ih_l0
12/01 04:00:43 PM:    _phrase_layer._module.weight_ih_l0_reverse
12/01 04:00:43 PM:  Finished building model in 20.876s
12/01 04:00:43 PM: patience = 5
12/01 04:00:43 PM: num_epochs = 10
12/01 04:00:43 PM: max_vals = 50
12/01 04:00:43 PM: cuda_device = -1
12/01 04:00:43 PM: grad_norm = 5.0
12/01 04:00:43 PM: grad_clipping = None
12/01 04:00:43 PM: lr_decay = 0.99
12/01 04:00:43 PM: min_lr = 1e-05
12/01 04:00:43 PM: no_tqdm = 0
12/01 04:00:43 PM: Sampling tasks uniformly
12/01 04:00:43 PM: type = sgd
12/01 04:00:43 PM: parameter_groups = None
12/01 04:00:43 PM: Number of trainable parameters: 9449994
12/01 04:00:43 PM: instantiating registered subclass sgd of <class 'allennlp.training.optimizers.Optimizer'>
12/01 04:00:43 PM: Converting Params object to dict; logging of default values will not occur when dictionary parameters are used subsequently.
12/01 04:00:43 PM: CURRENTLY DEFINED PARAMETERS: 
12/01 04:00:43 PM: lr = 0.1
12/01 04:00:43 PM: weight_decay = 1e-05
12/01 04:00:43 PM: type = reduce_on_plateau
12/01 04:00:43 PM: instantiating registered subclass reduce_on_plateau of <class 'allennlp.training.learning_rate_schedulers.LearningRateScheduler'>
12/01 04:00:43 PM: Converting Params object to dict; logging of default values will not occur when dictionary parameters are used subsequently.
12/01 04:00:43 PM: CURRENTLY DEFINED PARAMETERS: 
12/01 04:00:43 PM: mode = max
12/01 04:00:43 PM: factor = 0.5
12/01 04:00:43 PM: patience = 0
12/01 04:00:43 PM: threshold = 0.0
12/01 04:00:43 PM: threshold_mode = abs
12/01 04:00:43 PM: verbose = True
12/01 04:00:43 PM: type = sgd
12/01 04:00:43 PM: parameter_groups = None
12/01 04:00:43 PM: Number of trainable parameters: 9449994
12/01 04:00:43 PM: instantiating registered subclass sgd of <class 'allennlp.training.optimizers.Optimizer'>
12/01 04:00:43 PM: Converting Params object to dict; logging of default values will not occur when dictionary parameters are used subsequently.
12/01 04:00:43 PM: CURRENTLY DEFINED PARAMETERS: 
12/01 04:00:43 PM: lr = 0.1
12/01 04:00:43 PM: weight_decay = 1e-05
12/01 04:00:43 PM: type = reduce_on_plateau
12/01 04:00:43 PM: instantiating registered subclass reduce_on_plateau of <class 'allennlp.training.learning_rate_schedulers.LearningRateScheduler'>
12/01 04:00:43 PM: Converting Params object to dict; logging of default values will not occur when dictionary parameters are used subsequently.
12/01 04:00:43 PM: CURRENTLY DEFINED PARAMETERS: 
12/01 04:00:43 PM: mode = max
12/01 04:00:43 PM: factor = 0.5
12/01 04:00:43 PM: patience = 0
12/01 04:00:43 PM: threshold = 0.0
12/01 04:00:43 PM: threshold_mode = abs
12/01 04:00:43 PM: verbose = True
12/01 04:00:43 PM: type = sgd
12/01 04:00:43 PM: parameter_groups = None
12/01 04:00:43 PM: Number of trainable parameters: 9449994
12/01 04:00:43 PM: instantiating registered subclass sgd of <class 'allennlp.training.optimizers.Optimizer'>
12/01 04:00:43 PM: Converting Params object to dict; logging of default values will not occur when dictionary parameters are used subsequently.
12/01 04:00:43 PM: CURRENTLY DEFINED PARAMETERS: 
12/01 04:00:43 PM: lr = 0.1
12/01 04:00:43 PM: weight_decay = 1e-05
12/01 04:00:43 PM: type = reduce_on_plateau
12/01 04:00:43 PM: instantiating registered subclass reduce_on_plateau of <class 'allennlp.training.learning_rate_schedulers.LearningRateScheduler'>
12/01 04:00:43 PM: Converting Params object to dict; logging of default values will not occur when dictionary parameters are used subsequently.
12/01 04:00:43 PM: CURRENTLY DEFINED PARAMETERS: 
12/01 04:00:43 PM: mode = max
12/01 04:00:43 PM: factor = 0.5
12/01 04:00:43 PM: patience = 0
12/01 04:00:43 PM: threshold = 0.0
12/01 04:00:43 PM: threshold_mode = abs
12/01 04:00:43 PM: verbose = True
12/01 04:00:43 PM: type = sgd
12/01 04:00:43 PM: parameter_groups = None
12/01 04:00:43 PM: Number of trainable parameters: 9449994
12/01 04:00:43 PM: instantiating registered subclass sgd of <class 'allennlp.training.optimizers.Optimizer'>
12/01 04:00:43 PM: Converting Params object to dict; logging of default values will not occur when dictionary parameters are used subsequently.
12/01 04:00:43 PM: CURRENTLY DEFINED PARAMETERS: 
12/01 04:00:43 PM: lr = 0.1
12/01 04:00:43 PM: weight_decay = 1e-05
12/01 04:00:43 PM: type = reduce_on_plateau
12/01 04:00:43 PM: instantiating registered subclass reduce_on_plateau of <class 'allennlp.training.learning_rate_schedulers.LearningRateScheduler'>
12/01 04:00:43 PM: Converting Params object to dict; logging of default values will not occur when dictionary parameters are used subsequently.
12/01 04:00:43 PM: CURRENTLY DEFINED PARAMETERS: 
12/01 04:00:43 PM: mode = max
12/01 04:00:43 PM: factor = 0.5
12/01 04:00:43 PM: patience = 0
12/01 04:00:43 PM: threshold = 0.0
12/01 04:00:43 PM: threshold_mode = abs
12/01 04:00:43 PM: verbose = True
12/01 04:00:43 PM: Beginning training.
Traceback (most recent call last):
  File "main.py", line 280, in <module>
    sys.exit(main(sys.argv[1:]))
  File "main.py", line 177, in main
    args.load_model)
  File "/Users/apple/Desktop/q1_course/CS273/ml_final/CS273A/src/trainer.py", line 776, in train
    output_dict = self._forward(batch, task=task, for_training=True)
  File "/Users/apple/Desktop/q1_course/CS273/ml_final/CS273A/src/trainer.py", line 1003, in _forward
    return self._model.forward(task, **tensor_batch)
  File "/Users/apple/Desktop/q1_course/CS273/ml_final/CS273A/src/models.py", line 216, in forward
    pair_emb = self.pair_encoder(input1, input2)
  File "/Users/apple/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/apple/Desktop/q1_course/CS273/ml_final/CS273A/src/models.py", line 289, in forward
    s1_elmo_embs = self._elmo(s1['elmo'])
KeyError: 'elmo'

If you suspect this is an IPython bug, please report it at:
    https://github.com/ipython/ipython/issues
or send an email to the mailing list at ipython-dev@python.org

You can print a more detailed traceback right now with "%tb", or use "%debug"
to interactively debug it.

Extra-detailed tracebacks for bug-reporting purposes can be enabled via:
    %config Application.verbose_crash=True
sleepinyourhat commented 5 years ago

I don't immediately see what's wrong—@W4ngatang would know better. That said, what are you trying to do?

If you don't need to match the exact setup of the GLUE paper down to the last hyperparameter, you'll have a much easier time reproducing our experiments with the newer jiant toolkit, which has more people and more documentation: https://github.com/jsalt18-sentence-repl/jiant

W4ngatang commented 5 years ago

My bet is that you previously ran without ELMo and the script cached the preprocessed data w/o ELMo indexing. Try deleting those files and rerunning.