xmxoxo / BERT-train2deploy

BERT模型从训练到部署
517 stars 165 forks source link

用自己训练好的模型转换成pd文件后,启动ner服务报错 #17

Closed neveroma closed 4 years ago

neveroma commented 4 years ago

转换pd文件时成功完成,启动服务时报错 这是我的启动脚本

bert-base-serving-start -model_dir $TRAINED_CLASSIFIER/$EXP_NAME -bert_model_dir $BERT_BASE_DIR -model_pb_dir $TRAINED_CLASSIFIER/$EXP_NAME -mode NER -max_seq_len 128 -http_port 8091 -port 5575 -port_out 5576 -device_map 1

pd文件名:classification_model.pb 报错代码如下

E:NER_MODEL, Lodding...:[gra:opt:306]:fail to optimize the graph! float division by zero
Traceback (most recent call last):
  File "/root/anaconda3/lib/python3.7/site-packages/bert_base/server/graph.py", line 289, in optimize_ner_model
    labels=None, num_labels=num_labels, use_one_hot_embeddings=False, dropout_rate=1.0)
  File "/root/anaconda3/lib/python3.7/site-packages/bert_base/train/models.py", line 101, in create_model
    rst = blstm_crf.add_blstm_crf_layer(crf_only=True)
  File "/root/anaconda3/lib/python3.7/site-packages/bert_base/train/lstm_crf_layer.py", line 60, in add_blstm_crf_layer
    loss, trans = self.crf_layer(logits)
  File "/root/anaconda3/lib/python3.7/site-packages/bert_base/train/lstm_crf_layer.py", line 160, in crf_layer
    initializer=self.initializers.xavier_initializer())
  File "/root/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1496, in get_variable
    aggregation=aggregation)
  File "/root/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1239, in get_variable
    aggregation=aggregation)
  File "/root/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py", line 562, in get_variable
    aggregation=aggregation)
  File "/root/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py", line 514, in _true_getter
    aggregation=aggregation)
  File "/root/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py", line 929, in _get_single_variable
    aggregation=aggregation)
  File "/root/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/variables.py", line 259, in __call__
    return cls._variable_v1_call(*args, **kwargs)
  File "/root/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/variables.py", line 220, in _variable_v1_call
    shape=shape)
  File "/root/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/variables.py", line 198, in <lambda>
    previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
  File "/root/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py", line 2511, in default_variable_creator
    shape=shape)
  File "/root/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/variables.py", line 263, in __call__
    return super(VariableMetaclass, cls).__call__(*args, **kwargs)
  File "/root/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/variables.py", line 1568, in __init__
    shape=shape)
  File "/root/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/variables.py", line 1698, in _init_from_args
    initial_value(), name="initial_value", dtype=dtype)
  File "/root/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py", line 901, in <lambda>
    partition_info=partition_info)
  File "/root/anaconda3/lib/python3.7/site-packages/tensorflow/contrib/layers/python/layers/initializers.py", line 143, in _initializer
    limit = math.sqrt(3.0 * factor / n)
ZeroDivisionError: float division by zero
Traceback (most recent call last):
  File "/root/anaconda3/bin/bert-base-serving-start", line 10, in <module>
    sys.exit(start_server())
  File "/root/anaconda3/lib/python3.7/site-packages/bert_base/runs/__init__.py", line 17, in start_server
    server = BertServer(args)
  File "/root/anaconda3/lib/python3.7/site-packages/bert_base/server/__init__.py", line 102, in __init__
    raise FileNotFoundError('graph optimization fails and returns empty result')
FileNotFoundError: graph optimization fails and returns empty result
xmxoxo commented 4 years ago

请检查设置的环境变量值是否正确: $TRAINED_CLASSIFIER $EXP_NAME

请参考readme.md中的说明,指定对应的环境变量,或者直接写出模型文件的目录名称;

neveroma commented 4 years ago

我试了一下改了环境变量,故意输错

E:NER_MODEL, Lodding...:[gra:opt:306]:fail to optimize the graph! /work/dl/pretrained-model/chinese_L-12_H-768_A-12/1/bert_config.json; No such file or directory
FileNotFoundError: [Errno 2] No such file or directory: '/work/dl/BERT-train2deploy/output/certificate1/label2id.pkl'

报的错和前面的不同

neveroma commented 4 years ago

找到原因了 由于我需要运行的是ner模式,看了下bert_base/server/graph.py的源码

 # 如果PB文件已经存在则,返回PB文件的路径,否则将模型转化为PB文件,并且返回存储PB文件的路径
        if args.model_pb_dir is None:
            # 获取当前的运行路径
            tmp_file = os.path.join(os.getcwd(), 'predict_optimizer')
            if not os.path.exists(tmp_file):
                os.mkdir(tmp_file)
        else:
            tmp_file = args.model_pb_dir
        pb_file = os.path.join(tmp_file, 'ner_model.pb')
        if os.path.exists(pb_file):
            print('pb_file exits', pb_file)
            return pb_file

NER的PD文件默认名称是 ner_model.pb 把freeze_graph.py

pb_file = os.path.join(tmp_dir, 'classification_model.pb')

改成

pb_file = os.path.join(tmp_dir, 'ner_model.pb')

或者仿造 optimize_class_model 方法 加一个 optimize_ner_model 服务就可以跑起来了