kamalkraj / ALBERT-TF2.0

ALBERT model Pretraining and Fine Tuning using TF2.0
Apache License 2.0
200 stars 45 forks source link

AssertionError #17

Closed chiragsanghvi10 closed 4 years ago

chiragsanghvi10 commented 4 years ago

Hi @kamalkraj Thank you for the previous fix.
I am working on STS-B data set and I am executing the following commands in Ubuntu

export GLUE_DIR=glue_data
export ALBERT_DIR=model_configs/large
export TASK_NAME=STS
export OUTPUT_DIR=stsb_processed
mkdir $OUTPUT_DIR
export MODEL_DIR=output_stsb

python run_classifer.py \
--train_data_path=${OUTPUT_DIR}/${TASK_NAME}_train.tf_record \
--eval_data_path=${OUTPUT_DIR}/${TASK_NAME}_eval.tf_record \
--input_meta_data_path=${OUTPUT_DIR}/${TASK_NAME}_meta_data \
--albert_config_file=${ALBERT_DIR}/config.json \
--task_name=${TASK_NAME} \
--spm_model_file=${ALBERT_DIR}/vocab/30k-clean.model \
--output_dir=${MODEL_DIR} \
--init_checkpoint=${ALBERT_DIR}/tf2_model.h5 \
--do_train \
--do_eval \
--train_batch_size=16 \
--learning_rate=1e-5 \
--custom_training_loop

Error message :

I1209 13:14:37.739436 140685254485824 run_classifer.py:306] Running training I1209 13:14:37.739539 140685254485824 run_classifer.py:307] Num examples = 5749 I1209 13:14:37.739591 140685254485824 run_classifer.py:308] Batch size = 16 I1209 13:14:37.739633 140685254485824 run_classifer.py:309] Num steps = 1077 Traceback (most recent call last): File "run_classifer.py", line 452, in app.run(main) File "/home/vv/venvv/lib/python3.6/site-packages/absl/app.py", line 299, in run _run_main(main, args) File "/home/vv/venvv/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main sys.exit(main(argv)) File "run_classifer.py", line 355, in main custom_callbacks = custom_callbacks) File "/home/vv/git/ALBERT-TF2.0/model_training_utils.py", line 155, in run_customized_training_loop assert tf.executing_eagerly() AssertionError

Any idea on the same?

kamalkraj commented 4 years ago

@chiragsanghvi10 TensorFlow version ?

chiragsanghvi10 commented 4 years ago

@kamalkraj TensorFlow version 2.0

kamalkraj commented 4 years ago

Try installing Tensorflow in a Virtual Env and try again

chiragsanghvi10 commented 4 years ago

Yes, I tried this, Does this support GPU?

chiragsanghvi10 commented 4 years ago

Getting a RuntimeError now,

Traceback (most recent call last) :

2019-12-09 13:55:34.229624: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]
Traceback (most recent call last): File "run_classifer.py", line 452, in app.run(main) File "/home/vv/git/ALBERT-TF2.0/venv/lib/python3.6/site-packages/absl/app.py", line 299, in run _run_main(main, args) File "/home/vv/git/ALBERT-TF2.0/venv/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main sys.exit(main(argv)) File "run_classifer.py", line 302, in main loss_multiplier=loss_multiplier) File "run_classifer.py", line 190, in get_model pooledoutput, = albert_layer(input_word_ids, input_mask, input_type_ids) File "/home/vv/git/ALBERT-TF2.0/albert.py", line 212, in call return super(AlbertModel, self).call(inputs, *kwargs) File "/home/vv/git/ALBERT-TF2.0/venv/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 842, in call outputs = call_fn(cast_inputs, args, **kwargs) File "/home/vv/git/ALBERT-TF2.0/venv/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 237, in wrapper raise e.ag_error_metadata.to_exception(e) RuntimeError: in converted code: relative to /home/vv/git/ALBERT-TF2.0:

albert.py:229 call  *
    word_embeddings = self.embedding_lookup(input_word_ids)
venv/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/base_layer.py:817 __call__
    self._maybe_build(inputs)
venv/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/base_layer.py:2141 _maybe_build
    self.build(input_shapes)
albert.py:273 build
    dtype=self.dtype)
venv/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/base_layer.py:522 add_weight
    aggregation=aggregation)
venv/lib/python3.6/site-packages/tensorflow_core/python/training/tracking/base.py:744 _add_variable_with_custom_getter
    **kwargs_for_getter)
venv/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/base_layer_utils.py:139 make_variable
    shape=variable_shape if variable_shape else None)
venv/lib/python3.6/site-packages/tensorflow_core/python/ops/variables.py:258 __call__
    return cls._variable_v1_call(*args, **kwargs)
venv/lib/python3.6/site-packages/tensorflow_core/python/ops/variables.py:219 _variable_v1_call
    shape=shape)
venv/lib/python3.6/site-packages/tensorflow_core/python/ops/variables.py:65 getter
    return captured_getter(captured_previous, **kwargs)
venv/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py:1322 creator_with_resource_vars
    return self._create_variable(*args, **kwargs)
venv/lib/python3.6/site-packages/tensorflow_core/python/distribute/one_device_strategy.py:262 _create_variable
    return next_creator(*args, **kwargs)
venv/lib/python3.6/site-packages/tensorflow_core/python/ops/variables.py:197 <lambda>
    previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
venv/lib/python3.6/site-packages/tensorflow_core/python/ops/variable_scope.py:2507 default_variable_creator
    shape=shape)
venv/lib/python3.6/site-packages/tensorflow_core/python/ops/variables.py:262 __call__
    return super(VariableMetaclass, cls).__call__(*args, **kwargs)
venv/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1406 __init__
    distribute_strategy=distribute_strategy)
venv/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1537 _init_from_args
    initial_value() if init_from_fn else initial_value,
venv/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/base_layer_utils.py:119 <lambda>
    init_val = lambda: initializer(shape, dtype=dtype)
venv/lib/python3.6/site-packages/tensorflow_core/python/ops/init_ops_v2.py:343 __call__
    self.stddev, dtype)
venv/lib/python3.6/site-packages/tensorflow_core/python/ops/init_ops_v2.py:809 truncated_normal
    shape=shape, mean=mean, stddev=stddev, dtype=dtype, seed=self.seed)
venv/lib/python3.6/site-packages/tensorflow_core/python/ops/random_ops.py:171 truncated_normal
    mean_tensor = ops.convert_to_tensor(mean, dtype=dtype, name="mean")
venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1184 convert_to_tensor
    return convert_to_tensor_v2(value, dtype, preferred_dtype, name)
venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1242 convert_to_tensor_v2
    as_ref=False)
venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1296 internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
venv/lib/python3.6/site-packages/tensorflow_core/python/framework/tensor_conversion_registry.py:52 _default_conversion_function
    return constant_op.constant(value, dtype, name=name)
venv/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py:227 constant
    allow_broadcast=True)
venv/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py:235 _constant_impl
    t = convert_to_eager_tensor(value, ctx, dtype)
venv/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py:96 convert_to_eager_tensor
    return ops.EagerTensor(value, ctx.device_name, dtype)

RuntimeError: /job:localhost/replica:0/task:0/device:GPU:0 unknown device.
chiragsanghvi10 commented 4 years ago

I have tried installing

pip install tensorflow-gpu==2.0 as well, same error message.

RuntimeError: /job:localhost/replica:0/task:0/device:GPU:0 unknown device.

kamalkraj commented 4 years ago

@chiragsanghvi10 above error is due to cuda verison mismatch, so GPUs are not initialized properly Try running sample programs from TF2.0 examples. and make sure that GPUs are working properly

chiragsanghvi10 commented 4 years ago

@kamalkraj Thank you so much I was able to get the Pearson correlation. :+1: