rkadlec / asreader

This is an implementation of the Attention Sum Reader model as presented in "Text Comprehension with the Attention Sum Reader Network" available at http://arxiv.org/abs/1603.01547.
Other
95 stars 32 forks source link

fusion error : no models to be fused. #12

Open DaehanKim opened 7 years ago

DaehanKim commented 7 years ago
->  . ./quick-start-cbt-ne.sh 
Using gpu device 0: TITAN X (Pascal) (CNMeM is disabled, cuDNN 5105)
/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/__init__.py:600: UserWarning: Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5.
  warnings.warn(warn)
/home/daehan/asreader/asreader/text_comprehension/monitoring.py:132: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  return ints_to_words(word_ixs[:true_length])
Epoch 0, step 1 |#                                          | Elapsed Time: 0:00:00WARNING:blocks.algorithms:

Blocks tried to match the sources (['candidates_mask', 'question_mask', 'question', 'context_mask', 'candidates', 'context', 'answer']) of the training dataset to the names of the Theano variables (['answer', 'context_mask', 'context', 'question_mask', 'question', 'candidates']), but failed to do so. If you want to train on a subset of the sources that your dataset provides, pass the `sources` keyword argument to its constructor. Or pass on_unused_sources='warn' or on_unused_sources='ignore' to the GradientDescent algorithm.
Epoch 0, step 3398 |                 #                      | Elapsed Time: 0:27:13
Output will be stored in test_output_cbt
Computing new vocabulary for file ../data/CBTest/data/cbtest_NE_train.txt.
Processed line 100000
Processed line 200000
Processed line 300000
Processed line 400000
Processed line 500000
Processed line 600000
Processed line 700000
Processed line 800000
Processed line 900000
Processed line 1000000
Processed line 1100000
Processed line 1200000
Processed line 1300000
Processed line 1400000
Processed line 1500000
Processed line 1600000
Processed line 1700000
Processed line 1800000
Processed line 1900000
Processed line 2000000
Processed line 2100000
Processed line 2200000
Processed line 2300000
STATISTICS
Total words: 53825761
Total distinct words: 60278
STATISTICS
Total words: 942535
Total distinct words: 10720
Added 920 new words from file ../data/CBTest/data/cbtest_NE_valid_2000ex.txt to previous vocabulary.
STATISTICS
Total words: 1213447
Total distinct words: 13277
Added 1311 new words from file ../data/CBTest/data/cbtest_NE_test_2500ex.txt to previous vocabulary.
Parameters: 
    backward.state_to_state (128, 128).size=16384
    backward.state_to_gates (128, 256).size=32768
    backward.initial_state (128,).size=128
    fork_gate_inputs.b (256,).size=256
    fork_gate_inputs.W (256, 256).size=65536
    lookuptable.W (62513, 256).size=16003328
    fork_inputs.b (128,).size=128
    fork_inputs.W (256, 128).size=32768
    forward.state_to_state (128, 128).size=16384
    forward.state_to_gates (128, 256).size=32768
    forward.initial_state (128,).size=128
    fork_gate_inputs.b (256,).size=256
    fork_gate_inputs.W (256, 256).size=65536
    fork_inputs.b (128,).size=128
    fork_inputs.W (256, 128).size=32768
    backward.state_to_state (128, 128).size=16384
    backward.state_to_gates (128, 256).size=32768
    backward.initial_state (128,).size=128
    fork_gate_inputs.b (256,).size=256
    fork_gate_inputs.W (256, 256).size=65536
    fork_inputs.b (128,).size=128
    fork_inputs.W (256, 128).size=32768
    forward.state_to_state (128, 128).size=16384
    forward.state_to_gates (128, 256).size=32768
    forward.initial_state (128,).size=128
    fork_gate_inputs.b (256,).size=256
    fork_gate_inputs.W (256, 256).size=65536
    fork_inputs.b (128,).size=128
    fork_inputs.W (256, 128).size=32768
Trained parameters count: 16595200
Accuracy 0.3285
Accuracy 0.3524

-------------------------------------------------------------------------------
BEFORE FIRST EPOCH
-------------------------------------------------------------------------------
Training status:
     batch_interrupt_received: False
     epoch_interrupt_received: False
     epoch_started: True
     epochs_done: 0
     iterations_done: 0
     received_first_batch: False
     resumed_from: None
     training_started: True
Log records from the iteration 0:
     cbtest_NE_test_2500ex.txt_accuracy: 0.3524
     time_initialization: 294.001670837
     valid_accuracy: 0.3285

Accuracy 0.7085
Accuracy 0.6656

-------------------------------------------------------------------------------
AFTER ANOTHER EPOCH
-------------------------------------------------------------------------------
Training status:
     batch_interrupt_received: False
     best_valid_accuracy: 0.7085
     bestsave_valid_accuracy: 0.7085
     epoch_interrupt_received: False
     epoch_started: False
     epochs_done: 1
     iterations_done: 3398
     received_first_batch: True
     resumed_from: None
     training_started: True
Log records from the iteration 3398:
     cbtest_NE_test_2500ex.txt_accuracy: 0.6656
     saved_to: ('test_output_cbt/model.blocks.pkl.best.accuracy.b=32_qice=False_ehd=128_sed=256_lr=0.001_gc=10.0',)
     time_read_data_this_epoch: 0.0
     time_read_data_total: 10.9458982944
     time_train_this_epoch: 0.0
     time_train_total: 1619.63922453
     train_accuracy: 0.578182018835
     train_cost: 1.45635521412
     valid_accuracy: 0.7085
     valid_accuracy_best_so_far: True
     valid_accuracy_best_so_far_patience_epochs: 2

Epoch 1, step 3398 |                #                       | Elapsed Time: 0:26:22
Accuracy 0.7185
Accuracy 0.678

-------------------------------------------------------------------------------
AFTER ANOTHER EPOCH
-------------------------------------------------------------------------------
Training status:
     batch_interrupt_received: False
     best_valid_accuracy: 0.7185
     bestsave_valid_accuracy: 0.7185
     epoch_interrupt_received: False
     epoch_started: False
     epochs_done: 2
     iterations_done: 6796
     received_first_batch: True
     resumed_from: None
     training_started: True
Log records from the iteration 6796:
     cbtest_NE_test_2500ex.txt_accuracy: 0.678
     saved_to: ('test_output_cbt/model.blocks.pkl.best.accuracy.b=32_qice=False_ehd=128_sed=256_lr=0.001_gc=10.0',)
     time_read_data_this_epoch: 0.0
     time_read_data_total: 21.8862273693
     time_train_this_epoch: 0.0
     time_train_total: 3187.89451265
     train_accuracy: 0.745801451834
     train_cost: 0.800129711628
     valid_accuracy: 0.7185
     valid_accuracy_best_so_far: True
     valid_accuracy_best_so_far_patience_epochs: 2

Epoch 2, step 3398 |                     #                  | Elapsed Time: 0:25:06
Accuracy 0.709
Accuracy 0.6708

-------------------------------------------------------------------------------
AFTER ANOTHER EPOCH
-------------------------------------------------------------------------------
Training status:
     batch_interrupt_received: False
     best_valid_accuracy: 0.7185
     bestsave_valid_accuracy: 0.7185
     epoch_interrupt_received: False
     epoch_started: False
     epochs_done: 3
     iterations_done: 10194
     received_first_batch: True
     resumed_from: None
     training_started: True
Log records from the iteration 10194:
     cbtest_NE_test_2500ex.txt_accuracy: 0.6708
     time_read_data_this_epoch: 0.0
     time_read_data_total: 32.1182944775
     time_train_this_epoch: 0.0
     time_train_total: 4680.80644464
     train_accuracy: 0.864274328036
     train_cost: 0.451196849346
     valid_accuracy: 0.709
     valid_accuracy_best_so_far_patience_epochs: 1

Epoch 3, step 3398 |#                                       | Elapsed Time: 0:25:27
Accuracy 0.699
Accuracy 0.674

-------------------------------------------------------------------------------
AFTER ANOTHER EPOCH
-------------------------------------------------------------------------------
Training status:
     batch_interrupt_received: False
     best_valid_accuracy: 0.7185
     bestsave_valid_accuracy: 0.7185
     epoch_interrupt_received: False
     epoch_started: False
     epochs_done: 4
     iterations_done: 13592
     received_first_batch: True
     resumed_from: None
     training_started: True
Log records from the iteration 13592:
     cbtest_NE_test_2500ex.txt_accuracy: 0.674
     time_read_data_this_epoch: 0.0
     time_read_data_total: 42.7062847614
     time_train_this_epoch: 0.0
     time_train_total: 6195.07133412
     train_accuracy: 0.94677690308
     train_cost: 0.189720034599
     training_finish_requested: True
     valid_accuracy: 0.699
     valid_accuracy_best_so_far_patience_epochs: 0

-------------------------------------------------------------------------------
TRAINING HAS BEEN FINISHED:
-------------------------------------------------------------------------------
Training status:
     batch_interrupt_received: False
     best_valid_accuracy: 0.7185
     bestsave_valid_accuracy: 0.7185
     epoch_interrupt_received: False
     epoch_started: False
     epochs_done: 4
     iterations_done: 13592
     received_first_batch: True
     resumed_from: None
     training_started: True
Log records from the iteration 13592:
     cbtest_NE_test_2500ex.txt_accuracy: 0.674
     time_read_data_this_epoch: 0.0
     time_read_data_total: 42.7062847614
     time_train_this_epoch: 0.0
     time_train_total: 6195.07133412
     train_accuracy: 0.94677690308
     train_cost: 0.189720034599
     training_finish_requested: True
     training_finished: True
     valid_accuracy: 0.699
     valid_accuracy_best_so_far_patience_epochs: 0

Using gpu device 0: TITAN X (Pascal) (CNMeM is disabled, cuDNN 5105)
/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/__init__.py:600: UserWarning: Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5.
  warnings.warn(warn)
/home/daehan/asreader/asreader/text_comprehension/monitoring.py:132: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  return ints_to_words(word_ixs[:true_length])
Epoch 0, step 1 |#                                          | Elapsed Time: 0:00:00WARNING:blocks.algorithms:

Blocks tried to match the sources (['candidates_mask', 'question_mask', 'question', 'context_mask', 'candidates', 'context', 'answer']) of the training dataset to the names of the Theano variables (['answer', 'context_mask', 'context', 'question_mask', 'question', 'candidates']), but failed to do so. If you want to train on a subset of the sources that your dataset provides, pass the `sources` keyword argument to its constructor. Or pass on_unused_sources='warn' or on_unused_sources='ignore' to the GradientDescent algorithm.
Epoch 0, step 3398 |                      #                 | Elapsed Time: 0:44:34
Output will be stored in test_output_cbt
Computing new vocabulary for file ../data/CBTest/data/cbtest_NE_train.txt.
Processed line 100000
Processed line 200000
Processed line 300000
Processed line 400000
Processed line 500000
Processed line 600000
Processed line 700000
Processed line 800000
Processed line 900000
Processed line 1000000
Processed line 1100000
Processed line 1200000
Processed line 1300000
Processed line 1400000
Processed line 1500000
Processed line 1600000
Processed line 1700000
Processed line 1800000
Processed line 1900000
Processed line 2000000
Processed line 2100000
Processed line 2200000
Processed line 2300000
STATISTICS
Total words: 53825761
Total distinct words: 60278
STATISTICS
Total words: 942535
Total distinct words: 10720
Added 920 new words from file ../data/CBTest/data/cbtest_NE_valid_2000ex.txt to previous vocabulary.
STATISTICS
Total words: 1213447
Total distinct words: 13277
Added 1311 new words from file ../data/CBTest/data/cbtest_NE_test_2500ex.txt to previous vocabulary.
Parameters: 
    backward.state_to_state (384, 384).size=147456
    backward.state_to_gates (384, 768).size=294912
    backward.initial_state (384,).size=384
    fork_gate_inputs.b (768,).size=768
    fork_gate_inputs.W (384, 768).size=294912
    lookuptable.W (62513, 384).size=24004992
    fork_inputs.b (384,).size=384
    fork_inputs.W (384, 384).size=147456
    forward.state_to_state (384, 384).size=147456
    forward.state_to_gates (384, 768).size=294912
    forward.initial_state (384,).size=384
    fork_gate_inputs.b (768,).size=768
    fork_gate_inputs.W (384, 768).size=294912
    fork_inputs.b (384,).size=384
    fork_inputs.W (384, 384).size=147456
    backward.state_to_state (384, 384).size=147456
    backward.state_to_gates (384, 768).size=294912
    backward.initial_state (384,).size=384
    fork_gate_inputs.b (768,).size=768
    fork_gate_inputs.W (384, 768).size=294912
    fork_inputs.b (384,).size=384
    fork_inputs.W (384, 384).size=147456
    forward.state_to_state (384, 384).size=147456
    forward.state_to_gates (384, 768).size=294912
    forward.initial_state (384,).size=384
    fork_gate_inputs.b (768,).size=768
    fork_gate_inputs.W (384, 768).size=294912
    fork_inputs.b (384,).size=384
    fork_inputs.W (384, 384).size=147456
Trained parameters count: 27550080
Accuracy 0.3295
Accuracy 0.3484

-------------------------------------------------------------------------------
BEFORE FIRST EPOCH
-------------------------------------------------------------------------------
Training status:
     batch_interrupt_received: False
     epoch_interrupt_received: False
     epoch_started: True
     epochs_done: 0
     iterations_done: 0
     received_first_batch: False
     resumed_from: None
     training_started: True
Log records from the iteration 0:
     cbtest_NE_test_2500ex.txt_accuracy: 0.3484
     time_initialization: 62.9041521549
     valid_accuracy: 0.3295

Accuracy 0.723
Accuracy 0.674

-------------------------------------------------------------------------------
AFTER ANOTHER EPOCH
-------------------------------------------------------------------------------
Training status:
     batch_interrupt_received: False
     best_valid_accuracy: 0.723
     bestsave_valid_accuracy: 0.723
     epoch_interrupt_received: False
     epoch_started: False
     epochs_done: 1
     iterations_done: 3398
     received_first_batch: True
     resumed_from: None
     training_started: True
Log records from the iteration 3398:
     cbtest_NE_test_2500ex.txt_accuracy: 0.674
     saved_to: ('test_output_cbt/model.blocks.pkl.best.accuracy.b=32_qice=False_ehd=384_sed=384_lr=0.001_gc=10.0',)
     time_read_data_this_epoch: 0.0
     time_read_data_total: 10.751932621
     time_train_this_epoch: 0.0
     time_train_total: 2660.37185073
     train_accuracy: 0.600051500883
     train_cost: 1.41588568687
     valid_accuracy: 0.723
     valid_accuracy_best_so_far: True
     valid_accuracy_best_so_far_patience_epochs: 2

Epoch 1, step 3398 |    #                                   | Elapsed Time: 0:44:07
Accuracy 0.724
Accuracy 0.6904

-------------------------------------------------------------------------------
AFTER ANOTHER EPOCH
-------------------------------------------------------------------------------
Training status:
     batch_interrupt_received: False
     best_valid_accuracy: 0.724
     bestsave_valid_accuracy: 0.724
     epoch_interrupt_received: False
     epoch_started: False
     epochs_done: 2
     iterations_done: 6796
     received_first_batch: True
     resumed_from: None
     training_started: True
Log records from the iteration 6796:
     cbtest_NE_test_2500ex.txt_accuracy: 0.6904
     saved_to: ('test_output_cbt/model.blocks.pkl.best.accuracy.b=32_qice=False_ehd=384_sed=384_lr=0.001_gc=10.0',)
     time_read_data_this_epoch: 0.0
     time_read_data_total: 21.5938019753
     time_train_this_epoch: 0.0
     time_train_total: 5293.97357774
     train_accuracy: 0.770364307436
     train_cost: 0.747857391834
     valid_accuracy: 0.724
     valid_accuracy_best_so_far: True
     valid_accuracy_best_so_far_patience_epochs: 2

Epoch 2, step 3398 |                                       #| Elapsed Time: 0:44:03
Accuracy 0.705
Accuracy 0.6688

-------------------------------------------------------------------------------
AFTER ANOTHER EPOCH
-------------------------------------------------------------------------------
Training status:
     batch_interrupt_received: False
     best_valid_accuracy: 0.724
     bestsave_valid_accuracy: 0.724
     epoch_interrupt_received: False
     epoch_started: False
     epochs_done: 3
     iterations_done: 10194
     received_first_batch: True
     resumed_from: None
     training_started: True
Log records from the iteration 10194:
     cbtest_NE_test_2500ex.txt_accuracy: 0.6688
     time_read_data_this_epoch: 0.0
     time_read_data_total: 32.2682154179
     time_train_this_epoch: 0.0
     time_train_total: 7924.13252592
     train_accuracy: 0.89576466549
     train_cost: 0.361049056053
     valid_accuracy: 0.705
     valid_accuracy_best_so_far_patience_epochs: 1

Epoch 3, step 3398 |                            #           | Elapsed Time: 0:43:57
Accuracy 0.7
Accuracy 0.676

-------------------------------------------------------------------------------
AFTER ANOTHER EPOCH
-------------------------------------------------------------------------------
Training status:
     batch_interrupt_received: False
     best_valid_accuracy: 0.724
     bestsave_valid_accuracy: 0.724
     epoch_interrupt_received: False
     epoch_started: False
     epochs_done: 4
     iterations_done: 13592
     received_first_batch: True
     resumed_from: None
     training_started: True
Log records from the iteration 13592:
     cbtest_NE_test_2500ex.txt_accuracy: 0.676
     time_read_data_this_epoch: 0.0
     time_read_data_total: 42.9270167351
     time_train_this_epoch: 0.0
     time_train_total: 10547.6050875
     train_accuracy: 0.963760545419
     train_cost: 0.138483583927
     training_finish_requested: True
     valid_accuracy: 0.7
     valid_accuracy_best_so_far_patience_epochs: 0

-------------------------------------------------------------------------------
TRAINING HAS BEEN FINISHED:
-------------------------------------------------------------------------------
Training status:
     batch_interrupt_received: False
     best_valid_accuracy: 0.724
     bestsave_valid_accuracy: 0.724
     epoch_interrupt_received: False
     epoch_started: False
     epochs_done: 4
     iterations_done: 13592
     received_first_batch: True
     resumed_from: None
     training_started: True
Log records from the iteration 13592:
     cbtest_NE_test_2500ex.txt_accuracy: 0.676
     time_read_data_this_epoch: 0.0
     time_read_data_total: 42.9270167351
     time_train_this_epoch: 0.0
     time_train_total: 10547.6050875
     train_accuracy: 0.963760545419
     train_cost: 0.138483583927
     training_finish_requested: True
     training_finished: True
     valid_accuracy: 0.7
     valid_accuracy_best_so_far_patience_epochs: 0

Validation files:

Best predictions to be copied:

Best validation model:
Traceback (most recent call last):
  File "text_comprehension/eval/copyBestPredictions.py", line 124, in <module>
    print bestValModel['params']
KeyError: 'params'

***** RUNNING THE FUSION SCRIPT *****

Models to be fused:

Ensemble (equal weights): 
/home/daehan/.local/lib/python2.7/site-packages/numpy/core/_methods.py:59: RuntimeWarning: Mean of empty slice.
  warnings.warn("Mean of empty slice.", RuntimeWarning)
/home/daehan/.local/lib/python2.7/site-packages/numpy/core/_methods.py:70: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
  File "text_comprehension/eval/fusion.py", line 394, in <module>
    result = fuse_predictions(prediction_files)
  File "text_comprehension/eval/fusion.py", line 158, in fuse_predictions
    ensemble_accuracy = accuracy(numpy.mean(all_preds, 0))
  File "text_comprehension/eval/fusion.py", line 57, in accuracy
    for row in probas:
TypeError: 'numpy.float64' object is not iterable

I think no model was found to fuse together, but am not sure what exactly is the problem... Can you help me solve this? Any help will be much appreciated!

mrcocytus commented 7 years ago

Have you resolved this problem? I get the same issue. @DaehanKim

Seayoung277 commented 7 years ago

This problem is caused by the inconsistency of the file prefix in quick-start-generic.sh. The prefix of .prediction files generated by training step is the names you provided in quick-start-cbt-ne.sh, which should be cbtest_NE_valid_2500ex.txt. and cbtest_NE_test_2500ex.txt. rather than validation.txt. and test.txt. in the quick-start-generic.sh script.

SOLUTION: You don't need to run the whole training again. You can comment out the training lines in quick-start-generic.sh and leave the rest two lines. The first line should be python text_comprehension/eval/copyBestPredictions.py -vp $3. -tp $4. -i $OUT_DIR -o $OUT_DIR/best_predictions and the second line does not need modification.