Training Script Bug - Githubissues

JohannesMaxWel commented 6 years ago

Training ESIM or DAM currently does not work.

Input: (without further modification to the default config)

python3 bin/jack-train.py with config='./conf/nli/snli/esim.yaml'

yields this error

Traceback (most recent calls WITHOUT Sacred internals):
  File "bin/jack-train.py", line 112, in run
    vocab = Vocab(vocab=embeddings.vocabulary if vocab_from_embeddings and embeddings is not None else None)
TypeError: __init__() got an unexpected keyword argument 'vocab'

Same for DAM.

dirkweissenborn commented 6 years ago

Weird. Works normally for me on a fresh version of master. You sure you on the latest master without changes?

pminervini commented 6 years ago

@JohannesMaxWel it works perfectly for me as well, maybe you forgot the PYTHONPATH=. prefix:

$ PYTHONPATH=. python3 bin/jack-train.py with config='./conf/nli/snli/esim.yaml'
WARNING - jack - No observers have been added to this run
INFO - jack - Running command 'run'
INFO - jack - Started
INFO - jack-train.py - TRAINING
WARNING - root - Changed type of config entry "parent_config" from str to DogmaticList
INFO - jack - Running command 'print_config'
INFO - jack - Started
Configuration (modified, added, typechanged, doc):
  batch_size = 32
  clip_value = 0.0
  config = './conf/nli/snli/esim.yaml'
  [..]
  model:
    encoder_layer = [{'activation': 'tanh',
  'dropout': True,
  'input': 'hypothesis',
  'module': 'lstm',
  'name': 'encoder',
  'with_projection': True},
 {'activation': 'tanh',
  'dropout': True,
  'input': 'premise',
  'module': 'lstm',
  'name': 'encoder',
  'with_projection': True},
 {'attn_type': 'dot',
  'concat': False,
  'dependent': 'hypothesis',
  'input': 'premise',
  'module': 'attention_matching',
  'output': 'hypothesis_attn'},
 {'attn_type': 'dot',
  'concat': False,
  'dependent': 'premise',
  'input': 'hypothesis',
  'module': 'attention_matching',
  'output': 'premise_attn'},
 {'input': ['premise', 'hypothesis_attn'],
  'module': 'mul',
  'output': 'premise_mul'},
 {'input': ['premise', 'hypothesis_attn'],
  'module': 'sub',
  'output': 'premise_sub'},
 {'input': ['premise', 'hypothesis_attn', 'premise_mul', 'premise_sub'],
  'module': 'concat',
  'output': 'premise'},
 {'activation': 'relu',
  'dropout': True,
  'input': 'premise',
  'module': 'dense',
  'name': 'projection'},
 {'input': ['hypothesis', 'premise_attn'],
  'module': 'mul',
  'output': 'hypothesis_mul'},
 {'input': ['hypothesis', 'premise_attn'],
  'module': 'sub',
  'output': 'hypothesis_sub'},
 {'input': ['hypothesis', 'premise_attn', 'hypothesis_mul', 'hypothesis_sub'],
  'module': 'concat',
  'output': 'hypothesis'},
 {'activation': 'relu',
  'dropout': True,
  'input': 'hypothesis',
  'module': 'dense',
  'name': 'projection'},
 {'input': 'hypothesis', 'module': 'lstm', 'name': 'composition'},
 {'input': 'premise', 'module': 'lstm', 'name': 'composition'}]
    prediction_layer:
      dropout = True
      module = 'max_avg_mlp'
INFO - jack - Completed after 0:00:00
INFO - jack-train.py - JACK_TEMP not set, setting it to /tmp/jack/bd806ec2-0831-4cc0-b644-7f8118156118. Might be used for caching.
INFO - jack-train.py - loaded train/dev/test data
INFO - jack-train.py - loaded pre-trained embeddings (data/GloVe/glove.840B.300d.memory_map_dir)
2018-07-15 16:21:59.968366: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-07-15 16:22:00.048260: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:897] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-07-15 16:22:00.048649: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties: 
name: TITAN X (Pascal) major: 6 minor: 1 memoryClockRate(GHz): 1.531
pciBusID: 0000:02:00.0
totalMemory: 11.91GiB freeMemory: 11.53GiB
2018-07-15 16:22:00.048661: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2018-07-15 16:22:00.189067: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-07-15 16:22:00.189092: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958]      0 
2018-07-15 16:22:00.189097: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   N 
2018-07-15 16:22:00.189301: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11163 MB memory) -> physical GPU (device: 0, name: TITAN X (Pascal), pci bus id: 0000:02:00.0, compute capability: 6.1)
INFO - jack.core.reader - Setting up model...
INFO - jack.core.reader - Preparing training data...
INFO - jack.core.input_module - OnlineInputModule pre-processes data on-the-fly in first epoch and caches results for subsequent epochs! That means, first epoch might be slower.
INFO - jack.core.reader - Number of parameters: 2704203
INFO - jack.core.reader - Start training...

uclnlp / jack

Training Script Bug #386