problem when using bert encoder

guhuawuli commented 4 years ago

Describe the bug A clear and concise description of what the bug is.

I want to do classification with bert encoder,my yaml file is input feature: name: review type: sequence encoder: bert config_path: checkpoint_path: do_lower_case: True preprocessing: tokenizer: bert vocab_file: padding_symbol: '[PAD]' unknown_symbol: '[UNK]' output feature name label type category

### the error message is : tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[128,12,256,256] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu

Environment (please complete the following information): GPU 1 k80 memory 13G

ubuntu 18.4
- Python version 3.6
- Ludwig version 0.2.1

the complete error message is: ludwig_version: '0.2.1' command: ('/usr/local/bin/ludwig experiment --data_csv ChnSentiCorp_htl_all.csv ' '--model_definition_file model_definition_bert.yaml') random_seed: 42 input_data: 'ChnSentiCorp_htl_all.csv' model_definition: { 'combiner': {'type': 'concat'}, 'input_features': [ { 'checkpoint_path': 'uncased_L-12_H-768_A-12/bert_model.ckpt', 'config_path': 'uncased_L-12_H-768_A-12/bert_config.json', 'do_lower_case': True, 'encoder': 'bert', 'name': 'review', 'preprocessing': { 'padding_symbol': '[PAD]', 'tokenizer': 'bert', 'unknown_symbol': '[UNK]', 'vocab_file': 'uncased_L-12_H-768_A-12/vocab.txt'}, 'tied_weights': None, 'type': 'sequence'}], 'output_features': [ { 'dependencies': [], 'loss': { 'class_similarities_temperature': 0, 'class_weights': 1, 'confidence_penalty': 0, 'distortion': 1, 'labels_smoothing': 0, 'negative_samples': 0, 'robust_lambda': 0, 'sampler': None, 'type': 'softmax_cross_entropy', 'unique': False, 'weight': 1}, 'name': 'label', 'reduce_dependencies': 'sum', 'reduce_input': 'sum', 'top_k': 3, 'type': 'category'}], 'preprocessing': { 'audio': { 'audio_feature': {'type': 'raw'}, 'audio_file_length_limit_in_s': 7.5, 'in_memory': True, 'missing_value_strategy': 'backfill', 'norm': None, 'padding_value': 0}, 'bag': { 'fill_value': '', 'lowercase': False, 'missing_value_strategy': 'fill_with_const', 'most_common': 10000, 'tokenizer': 'space'}, 'binary': { 'fill_value': 0, 'missing_value_strategy': 'fill_with_const'}, 'category': { 'fill_value': '', 'lowercase': False, 'missing_value_strategy': 'fill_with_const', 'most_common': 10000}, 'date': { 'datetime_format': None, 'fill_value': '', 'missing_value_strategy': 'fill_with_const'}, 'force_split': False, 'h3': { 'fill_value': 576495936675512319, 'missing_value_strategy': 'fill_with_const'}, 'image': { 'in_memory': True, 'missing_value_strategy': 'backfill', 'num_processes': 1, 'resize_method': 'interpolate', 'scaling': 'pixel_normalization'}, 'numerical': { 'fill_value': 0, 'missing_value_strategy': 'fill_with_const', 'normalization': None}, 'sequence': { 'fill_value': '', 'lowercase': False, 'missing_value_strategy': 'fill_with_const', 'most_common': 20000, 'padding': 'right', 'padding_symbol': '', 'sequence_length_limit': 256, 'tokenizer': 'space', 'unknown_symbol': '', 'vocab_file': None}, 'set': { 'fill_value': '', 'lowercase': False, 'missing_value_strategy': 'fill_with_const', 'most_common': 10000, 'tokenizer': 'space'}, 'split_probabilities': (0.7, 0.1, 0.2), 'stratify': None, 'text': { 'char_most_common': 70, 'char_sequence_length_limit': 1024, 'char_tokenizer': 'characters', 'char_vocab_file': None, 'fill_value': '', 'lowercase': True, 'missing_value_strategy': 'fill_with_const', 'padding': 'right', 'padding_symbol': '', 'unknown_symbol': '', 'word_most_common': 20000, 'word_sequence_length_limit': 256, 'word_tokenizer': 'space_punct', 'word_vocab_file': None}, 'timeseries': { 'fill_value': '', 'missing_value_strategy': 'fill_with_const', 'padding': 'right', 'padding_value': 0, 'timeseries_length_limit': 256, 'tokenizer': 'space'}, 'vector': { 'fill_value': '', 'missing_value_strategy': 'fill_with_const'}}, 'training': { 'batch_size': 128, 'bucketing_field': None, 'decay': False, 'decay_rate': 0.96, 'decay_steps': 10000, 'dropout_rate': 0.0, 'early_stop': 5, 'epochs': 100, 'eval_batch_size': 0, 'gradient_clipping': None, 'increase_batch_size_on_plateau': 0, 'increase_batch_size_on_plateau_max': 512, 'increase_batch_size_on_plateau_patience': 5, 'increase_batch_size_on_plateau_rate': 2, 'learning_rate': 0.001, 'learning_rate_warmup_epochs': 1, 'optimizer': { 'beta1': 0.9, 'beta2': 0.999, 'epsilon': 1e-08, 'type': 'adam'}, 'reduce_learning_rate_on_plateau': 0, 'reduce_learning_rate_on_plateau_patience': 5, 'reduce_learning_rate_on_plateau_rate': 0.5, 'regularization_lambda': 0, 'regularizer': 'l2', 'staircase': False, 'validation_field': 'combined', 'validation_measure': 'loss'}}

Found hdf5 and json with the same filename of the csv, using them instead Using full hdf5 and json Loading data from: ChnSentiCorp_htl_all.hdf5

Loading metadata from: ChnSentiCorp_htl_all.json Training set: 5502 Validation set: 719 Test set: 1545 WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/bert/modeling.py:93: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/bert/modeling.py:171: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/bert/modeling.py:409: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/bert/modeling.py:490: The name tf.assert_less_equal is deprecated. Please use tf.compat.v1.assert_less_equal instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/bert/modeling.py:358: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version. Instructions for updating: Please use rate instead of keep_prob. Rate should be set to rate = 1 - keep_prob. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/bert/modeling.py:671: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.dense instead. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/ludwig/models/modules/sequence_encoders.py:1731: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/ludwig/models/modules/sequence_encoders.py:1742: The name tf.train.init_from_checkpoint is deprecated. Please use tf.compat.v1.train.init_from_checkpoint instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/ludwig/models/modules/sequence_encoders.py:1749: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.dropout instead. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_grad.py:1205: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.where in 2.0, which has the same broadcast rule as np.where

╒══════════╕ │ TRAINING │ ╘══════════╛

2019-12-22 14:37:18.067377: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-12-22 14:37:18.075731: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-12-22 14:37:18.076009: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7282d90 executing computations on platform Host. Devices: 2019-12-22 14:37:18.076121: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): , 2019-12-22 14:37:21.772229: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.

Epoch 1 Training: 0%| | 0/43 [00:00<?, ?it/s]2019-12-22 14:38:11.186676: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at cwise_ops_common.cc:70 : Resource exhausted: OOM when allocating tensor with shape[128,12,256,256] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1356, in _do_call return fn(*args) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1341, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[128,12,256,256] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu [[{{node review/bert/encoder/layer_2/attention/self/dropout/mul}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/bin/ludwig", line 10, in sys.exit(main()) File "/usr/local/lib/python3.6/dist-packages/ludwig/cli.py", line 108, in main CLI() File "/usr/local/lib/python3.6/dist-packages/ludwig/cli.py", line 64, in init getattr(self, args.command)() File "/usr/local/lib/python3.6/dist-packages/ludwig/cli.py", line 69, in experiment experiment.cli(sys.argv[2:]) File "/usr/local/lib/python3.6/dist-packages/ludwig/experiment.py", line 529, in cli experiment(vars(args)) File "/usr/local/lib/python3.6/dist-packages/ludwig/experiment.py", line 219, in experiment kwargs File "/usr/local/lib/python3.6/dist-packages/ludwig/train.py", line 336, in full_train debug=debug File "/usr/local/lib/python3.6/dist-packages/ludwig/train.py", line 502, in train **model_definition['training'] File "/usr/local/lib/python3.6/dist-packages/ludwig/models/model.py", line 538, in train is_training=True File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 950, in run run_metadata_ptr) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1173, in _run feed_dict_tensor, options, run_metadata) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1350, in _do_run run_metadata) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1370, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[128,12,256,256] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu [[node review/bert/encoder/layer_2/attention/self/dropout/mul (defined at /lib/python3.6/dist-packages/bert/modeling.py:358) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Errors may have originated from an input operation. Input Source operations connected to node review/bert/encoder/layer_2/attention/self/dropout/mul: review/bert/encoder/layer_2/attention/self/Softmax (defined at /lib/python3.6/dist-packages/bert/modeling.py:720)

Original stack trace for 'review/bert/encoder/layer_2/attention/self/dropout/mul': File "/bin/ludwig", line 10, in sys.exit(main()) File "/lib/python3.6/dist-packages/ludwig/cli.py", line 108, in main CLI() File "/lib/python3.6/dist-packages/ludwig/cli.py", line 64, in init getattr(self, args.command)() File "/lib/python3.6/dist-packages/ludwig/cli.py", line 69, in experiment experiment.cli(sys.argv[2:]) File "/lib/python3.6/dist-packages/ludwig/experiment.py", line 529, in cli experiment(vars(args)) File "/lib/python3.6/dist-packages/ludwig/experiment.py", line 219, in experiment kwargs File "/lib/python3.6/dist-packages/ludwig/train.py", line 336, in full_train debug=debug File "/lib/python3.6/dist-packages/ludwig/train.py", line 483, in train debug=debug File "/lib/python3.6/dist-packages/ludwig/models/model.py", line 113, in init kwargs File "/lib/python3.6/dist-packages/ludwig/models/model.py", line 163, in __build is_training=self.is_training File "/lib/python3.6/dist-packages/ludwig/models/inputs.py", line 42, in build_inputs kwargs) File "/lib/python3.6/dist-packages/ludwig/models/inputs.py", line 69, in build_single_input kwargs) File "/lib/python3.6/dist-packages/ludwig/features/sequence_feature.py", line 167, in build_input is_training=is_training File "/lib/python3.6/dist-packages/ludwig/features/sequence_feature.py", line 182, in build_sequence_input is_training=is_training File "/lib/python3.6/dist-packages/ludwig/models/modules/sequence_encoders.py", line 1721, in call token_type_ids=tf.zeros_like(input_sequence), File "/lib/python3.6/dist-packages/bert/modeling.py", line 216, in init do_return_all_layers=True) File "/lib/python3.6/dist-packages/bert/modeling.py", line 844, in transformer_model to_seq_length=seq_length) File "/lib/python3.6/dist-packages/bert/modeling.py", line 724, in attention_layer attention_probs = dropout(attention_probs, attention_probs_dropout_prob) File "/lib/python3.6/dist-packages/bert/modeling.py", line 358, in dropout output = tf.nn.dropout(input_tensor, 1.0 - dropout_prob) File "/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(*args, *kwargs) File "/lib/python3.6/dist-packages/tensorflow/python/ops/nn_ops.py", line 4170, in dropout return dropout_v2(x, rate, noise_shape=noise_shape, seed=seed, name=name) File "/lib/python3.6/dist-packages/tensorflow/python/ops/nn_ops.py", line 4255, in dropout_v2 ret = x scale math_ops.cast(keep_mask, x.dtype) File "/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py", line 884, in binary_op_wrapper return func(x, y, name=name) File "/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py", line 1180, in _mul_dispatch return gen_math_ops.mul(x, y, name=name) File "/lib/python3.6/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 6490, in mul "Mul", x=x, y=y, name=name) File "/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(args, kwargs) File "/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3616, in create_op op_def=op_def) File "/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 2005, in init self._traceback = tf_stack.extract_stack()

Training: 0%| | 0/43 [00:45<?, ?it/s]

ifokeev commented 4 years ago

you have out of memory message. You don't have enough RAM to process this dataset

tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[128,12,256,256] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu

guhuawuli commented 4 years ago

how many ram at least to run bert encoder? I have 13G memory and I can run bert model with another code(https://github.com/guanlinchao/bert-dst). Does ludwig need more memory than general bert fine tune?

ifokeev commented 4 years ago

@guhuawuli do you run it with the same dataset? Try to chunk it

guhuawuli commented 4 years ago

I fond the solution, from here(https://medium.com/gowombat/first-impressions-about-ubers-ludwig-a-simple-machine-learning-tool-or-not-714962bbbedc). I must adjust batch size from 128 to 16

w4nderlust commented 4 years ago

Sorry for the late answer. Yes the BERT encoder is pretty big and the size of the activations is big too, so depending on your available VRAM / RAM you may need to decrease the batch size to make it run on your system. Thank you for posting the solution to this problem, closing the thread.

ludwig-ai / ludwig

problem when using bert encoder #605