stanleylsx / entity_extractor_by_ner

基于Tensorflow2.3开发的NER模型,都是CRF范式,包含Bilstm(IDCNN)-CRF、Bert-Bilstm(IDCNN)-CRF、Bert-CRF,可微调预训练模型,可对抗学习,用于命名实体识别,配置后可直接运行。
390 stars 74 forks source link

ValueError: Tensor's shape (300, 800) is not compatible with supplied shape (768, 800) #35

Closed LiAI-tech closed 2 years ago

LiAI-tech commented 2 years ago

root@0213952a61fb:/home/entity_extractor_by_ner# python main.py 2021-12-29 03:24:28.233606: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2021-12-29 03:24:29 ++++++++++++++++++++++++++++++++++++++++CONFIGURATION SUMMARY++++++++++++++++++++++++++++++++++++++++ Status: mode : train ++++++++++++++++++++++++++++++++++++++++ Datasets: datasets fold: data/example_datasets train file: train.csv validation file: dev.csv vocab dir: data/example_datasets/vocabs delimiter : b use bert: True use bilstm: True finetune : True checkpoints dir: checkpoints/finetune-bert-bilstm-crf log dir: data/example_datasets/logs ++++++++++++++++++++++++++++++++++++++++ Labeling Scheme: label scheme: BIO label level: 2 suffixes : ['ORG', 'PER', 'LOC'] measuring metrics: ['precision', 'recall', 'f1', 'accuracy'] ++++++++++++++++++++++++++++++++++++++++ Model Configuration: embedding dim: 768 max sequence length: 300 hidden dim: 200 CUDA VISIBLE DEVICE: 0 seed : 42 ++++++++++++++++++++++++++++++++++++++++ Training Settings: epoch : 300 batch size: 32 dropout : 0.5 learning rate: 0.001 optimizer : Adam checkpoint name: model max checkpoints: 3 print per_batch: 20 is early stop: True patient : 5 ++++++++++++++++++++++++++++++++++++++++CONFIGURATION SUMMARY END++++++++++++++++++++++++++++++++++++++++ loading vocab... dataManager initialed... mode: train loading data... 1112231it [00:48, 22721.77it/s] loading data... 223833it [00:09, 22922.39it/s] training set size: 23181, validating set size: 4636 2021-12-29 03:25:29.113422: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-12-29 03:25:29.114056: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1 2021-12-29 03:25:29.147051: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-12-29 03:25:29.147614: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3090 computeCapability: 8.6 coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s 2021-12-29 03:25:29.147632: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2021-12-29 03:25:29.149236: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 2021-12-29 03:25:29.149262: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11 2021-12-29 03:25:29.149942: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2021-12-29 03:25:29.150171: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2021-12-29 03:25:29.150271: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 2021-12-29 03:25:29.150685: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 2021-12-29 03:25:29.150785: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 2021-12-29 03:25:29.150797: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... 2021-12-29 03:25:29.151450: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-12-29 03:25:29.151470: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-12-29 03:25:29.151477: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]
Some layers from the model checkpoint at bert-base-chinese were not used when initializing TFBertModel: ['mlmcls', 'nspcls']

WARNING:tensorflow:Unresolved object in checkpoint: (root).ner_model.embedding.embeddings WARNING:tensorflow:Unresolved object in checkpoint: (root).ner_model.dense.kernel WARNING:tensorflow:Unresolved object in checkpoint: (root).ner_model.dense.bias WARNING:tensorflow:Unresolved object in checkpoint: (root).ner_model.bilstm.forward_layer.cell.kernel WARNING:tensorflow:Unresolved object in checkpoint: (root).ner_model.bilstm.forward_layer.cell.recurrent_kernel WARNING:tensorflow:Unresolved object in checkpoint: (root).ner_model.bilstm.forward_layer.cell.bias WARNING:tensorflow:Unresolved object in checkpoint: (root).ner_model.bilstm.backward_layer.cell.kernel WARNING:tensorflow:Unresolved object in checkpoint: (root).ner_model.bilstm.backward_layer.cell.recurrent_kernel WARNING:tensorflow:Unresolved object in checkpoint: (root).ner_model.bilstm.backward_layer.cell.bias WARNING:tensorflow:A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.

hjg-ok commented 2 years ago

你好,你解决了吗

LiAI-tech commented 2 years ago

应该是配置文件的问题,我把保存训练模型文件夹的配置文件复制过去就没报这个错了。但是我开始也是按说明改的外边的配置文件。

DengDengXu commented 2 years ago

你好,请问保存训练模型文件下的配置文件,在哪里找?

LiAI-tech commented 2 years ago

/checkpoints/finetune-bert-crf/system.config

stanleylsx commented 2 years ago

这几个问题常常有人咨询,原来的说明文档不清晰,我把说明文档详细了下,训练可以[step1]~[step4]参照说明文档。