ValueError: Tensor's shape (300, 800) is not compatible with supplied shape (768, 800)

LiAI-tech commented 2 years ago

root@0213952a61fb:/home/entity_extractor_by_ner# python main.py 2021-12-29 03:24:28.233606: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2021-12-29 03:24:29 ++++++++++++++++++++++++++++++++++++++++CONFIGURATION SUMMARY++++++++++++++++++++++++++++++++++++++++ Status: mode : train ++++++++++++++++++++++++++++++++++++++++ Datasets: datasets fold: data/example_datasets train file: train.csv validation file: dev.csv vocab dir: data/example_datasets/vocabs delimiter : b use bert: True use bilstm: True finetune : True checkpoints dir: checkpoints/finetune-bert-bilstm-crf log dir: data/example_datasets/logs ++++++++++++++++++++++++++++++++++++++++ Labeling Scheme: label scheme: BIO label level: 2 suffixes : ['ORG', 'PER', 'LOC'] measuring metrics: ['precision', 'recall', 'f1', 'accuracy'] ++++++++++++++++++++++++++++++++++++++++ Model Configuration: embedding dim: 768 max sequence length: 300 hidden dim: 200 CUDA VISIBLE DEVICE: 0 seed : 42 ++++++++++++++++++++++++++++++++++++++++ Training Settings: epoch : 300 batch size: 32 dropout : 0.5 learning rate: 0.001 optimizer : Adam checkpoint name: model max checkpoints: 3 print per_batch: 20 is early stop: True patient : 5 ++++++++++++++++++++++++++++++++++++++++CONFIGURATION SUMMARY END++++++++++++++++++++++++++++++++++++++++ loading vocab... dataManager initialed... mode: train loading data... 1112231it [00:48, 22721.77it/s] loading data... 223833it [00:09, 22922.39it/s] training set size: 23181, validating set size: 4636 2021-12-29 03:25:29.113422: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-12-29 03:25:29.114056: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1 2021-12-29 03:25:29.147051: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-12-29 03:25:29.147614: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3090 computeCapability: 8.6 coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s 2021-12-29 03:25:29.147632: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2021-12-29 03:25:29.149236: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 2021-12-29 03:25:29.149262: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11 2021-12-29 03:25:29.149942: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2021-12-29 03:25:29.150171: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2021-12-29 03:25:29.150271: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 2021-12-29 03:25:29.150685: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 2021-12-29 03:25:29.150785: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 2021-12-29 03:25:29.150797: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... 2021-12-29 03:25:29.151450: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-12-29 03:25:29.151470: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-12-29 03:25:29.151477: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]
Some layers from the model checkpoint at bert-base-chinese were not used when initializing TFBertModel: ['mlmcls', 'nspcls']

This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). All the layers of TFBertModel were initialized from the model checkpoint at bert-base-chinese. If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training. Restored from checkpoints/bilstm-crf/model-14 ++++++++++++++++++++training starting++++++++++++++++++++ epoch:1/300 0%| | 0/725 [00:02<?, ?it/s] Traceback (most recent call last): File "main.py", line 72, in train(configs, dataManager, logger) File "/home/entity_extractor_by_ner/engines/train.py", line 82, in train logits, log_likelihood, transition_params = ner_model( File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 1012, in call outputs = call_fn(inputs, *args, kwargs) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/def_function.py", line 828, in call result = self._call(*args, *kwds) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/def_function.py", line 871, in _call self._initialize(args, kwds, add_initializers_to=initializers) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/def_function.py", line 725, in _initialize self._stateful_fn._get_concrete_function_internal_garbage_collected( # pylint: disable=protected-access File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/function.py", line 2969, in _get_concrete_function_internal_garbage_collected graphfunction, = self._maybe_define_function(args, kwargs) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/function.py", line 3361, in _maybe_define_function graph_function = self._create_graph_function(args, kwargs) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/function.py", line 3196, in _create_graph_function func_graph_module.func_graph_from_py_func( File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/func_graph.py", line 990, in func_graph_from_py_func func_outputs = python_func(func_args, func_kwargs) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/def_function.py", line 634, in wrapped_fn out = weak_wrapped_fn().wrapped(*args, *kwds) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/function.py", line 3887, in bound_method_wrapper return wrapped_fn(args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/func_graph.py", line 977, in wrapper raise e.ag_error_metadata.to_exception(e) ValueError: in user code:

/home/entity_extractor_by_ner/engines/model.py:43 call outputs = self.bilstm(outputs) /usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/layers/wrappers.py:539 call return super(Bidirectional, self).call(inputs, kwargs) /usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/base_layer.py:1008 call self._maybe_build(inputs) /usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/base_layer.py:2710 _maybe_build self.build(input_shapes) # pylint:disable=not-callable /usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/layers/wrappers.py:694 build self.forward_layer.build(input_shape) /usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/layers/recurrent.py:578 build self.cell.build(step_input_shape) /usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/utils/tf_utils.py:272 wrapper output_shape = fn(instance, input_shape) /usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/layers/recurrent.py:2344 build self.kernel = self.add_weight( /usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/base_layer.py:623 add_weight variable = self._add_variable_with_custom_getter( /usr/local/lib/python3.8/dist-packages/tensorflow/python/training/tracking/base.py:805 _add_variable_with_custom_getter new_variable = getter( /usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/base_layer_utils.py:130 make_variable return tf_variables.VariableV1( /usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/variables.py:260 call return cls._variable_v1_call(args, kwargs) /usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/variables.py:206 _variable_v1_call return previous_getter( /usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/variables.py:67 getter return captured_getter(captured_previous, kwargs) /usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/def_function.py:712 variable_capturing_scope v = UnliftedInitializerVariable( /usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/variables.py:264 call return super(VariableMetaclass, cls).call(*args, **kwargs) /usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/def_function.py:227 init initial_value = initial_value() /usr/local/lib/python3.8/dist-packages/tensorflow/python/training/tracking/base.py:81 call return CheckpointInitialValue( /usr/local/lib/python3.8/dist-packages/tensorflow/python/training/tracking/base.py:117 init self.wrapped_value.set_shape(shape) /usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py:1215 set_shape raise ValueError(

ValueError: Tensor's shape (300, 800) is not compatible with supplied shape (768, 800)

WARNING:tensorflow:Unresolved object in checkpoint: (root).ner_model.embedding.embeddings WARNING:tensorflow:Unresolved object in checkpoint: (root).ner_model.dense.kernel WARNING:tensorflow:Unresolved object in checkpoint: (root).ner_model.dense.bias WARNING:tensorflow:Unresolved object in checkpoint: (root).ner_model.bilstm.forward_layer.cell.kernel WARNING:tensorflow:Unresolved object in checkpoint: (root).ner_model.bilstm.forward_layer.cell.recurrent_kernel WARNING:tensorflow:Unresolved object in checkpoint: (root).ner_model.bilstm.forward_layer.cell.bias WARNING:tensorflow:Unresolved object in checkpoint: (root).ner_model.bilstm.backward_layer.cell.kernel WARNING:tensorflow:Unresolved object in checkpoint: (root).ner_model.bilstm.backward_layer.cell.recurrent_kernel WARNING:tensorflow:Unresolved object in checkpoint: (root).ner_model.bilstm.backward_layer.cell.bias WARNING:tensorflow:A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.

hjg-ok commented 2 years ago

你好，你解决了吗

LiAI-tech commented 2 years ago

应该是配置文件的问题，我把保存训练模型文件夹的配置文件复制过去就没报这个错了。但是我开始也是按说明改的外边的配置文件。

DengDengXu commented 2 years ago

你好，请问保存训练模型文件下的配置文件，在哪里找？

LiAI-tech commented 2 years ago

/checkpoints/finetune-bert-crf/system.config

stanleylsx commented 2 years ago

这几个问题常常有人咨询，原来的说明文档不清晰，我把说明文档详细了下，训练可以[step1]~[step4]参照说明文档。

stanleylsx / entity_extractor_by_ner

ValueError: Tensor's shape (300, 800) is not compatible with supplied shape (768, 800) #35