zjunlp / OpenUE

[EMNLP 2020] OpenUE: An Open Toolkit of Universal Extraction from Text
http://openue.zjukg.org
MIT License
321 stars 61 forks source link

找不到setup_config.json #16

Closed roar090 closed 2 years ago

roar090 commented 2 years ago
    # read configs for the mode, model_name, etc. from setup_config.json
    setup_config_path = os.path.join(model_dir, "setup_config.json")
    if os.path.isfile(setup_config_path):
        with open(setup_config_path) as setup_config_file:
            self.setup_config = json.load(setup_config_file)
    else:
        logger.warning("Missing the setup_config.json file.")

执行run_ner.sh 训练出来的模型 文件夹下没有setup_config.json文件 ,打包成mar部署后 报错 Missing the setup_config.json file.

CheaSim commented 2 years ago

Our code is based on torchserve github repo https://github.com/pytorch/serve/tree/master/examples/Huggingface_Transformers. You can find the setup_config.json in it. Also you can modify some hyper-parameters in the json file.

{
 "model_name":"bert-base-uncased",
 "mode":"sequence_classification",
 "do_lower_case":true,
 "num_labels":"2",
 "save_mode":"pretrained",
 "max_length":"150",
 "captum_explanation":true,
 "embedding_name": "bert",
 "FasterTransformer":false
}
roar090 commented 2 years ago

谢谢!添加setup_config.json打包后模型在torchserve跑起来了。

另外,能否提供一个torchserve部署后,请求调用的参数示例?

POST http://localhost:3000/predictions/BERTForNER { "data":{ "body":"姚明,男,汉族,1980年9月12日出生于上海市徐汇区。" } }

`- --- Logging error ---

CheaSim commented 2 years ago

API is determined by the handler_ner.py and handler_ner.py code. By default, the input json should be

{
"input_ids":  List  # shape (128)
"attetnion_mask":  List # shape (128)
"token_type_ids": List # shape (128)
}
roar090 commented 2 years ago

非常感谢! it worked!! 另外还有两个地方有些疑惑:

1.请求参数要如何通过要预测的文本转化得到?

“姚明,男,汉族,1980年9月12日出生于上海市徐汇区。” -----> {"input_ids": List # shape (128),"attetnion_mask": List # shape (128)"token_type_ids": List # shape (128)}

2.torchserve返回的数据结果是什么样的对应关系?

{ "outputs": { "predicate_probabilities": [[ 0.608515739440918, 0.6202450394630432, 0.7023037672042847, -0.030435102060437202, -0.10990876704454422, ]] , "token_label_predictions": [ [ [ -2.197793960571289, -0.5554990768432617, -0.7127630710601807, -1.0187653303146362, -1.2688184976577759, 1.81879723072052, 9.981047630310059, -1.8538830280303955

] ] ] }

predicate_probabilities 部分是 list [ list [ ] ] 结构 , token_label_predictions 部分是 list [ list [ list [] ] ] 结构 , 如何对应到实体和关系上面 ?

谢谢 !✍

CheaSim commented 2 years ago

1. tokenize

Use the code below

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-chinese")
tokenizer("姚明,男,汉族,1980年9月12日出生于上海市徐汇区。")

2. read the OpenUE paper https://aclanthology.org/2020.emnlp-demos.1/

NER model will output the BIEOS ids, and SEQ model will output relation ids. predicate_probabilities is relation ids logits and token_label_predictions is the BIEOS ids for each tokens including [CLS] and [SEP].

So if you want to use the end2end relation extraction, you first need to get the relation types in the sentence, and then pad the relation ids to the sentence and use the sentence with relation ids to get the NER result. Finally, combine the entities in NER result and relation types in the RE model's output.