找不到setup_config.json

roar090 commented 2 years ago

    # read configs for the mode, model_name, etc. from setup_config.json
    setup_config_path = os.path.join(model_dir, "setup_config.json")
    if os.path.isfile(setup_config_path):
        with open(setup_config_path) as setup_config_file:
            self.setup_config = json.load(setup_config_file)
    else:
        logger.warning("Missing the setup_config.json file.")

执行run_ner.sh 训练出来的模型文件夹下没有setup_config.json文件，打包成mar部署后报错 Missing the setup_config.json file.

CheaSim commented 2 years ago

Our code is based on torchserve github repo https://github.com/pytorch/serve/tree/master/examples/Huggingface_Transformers. You can find the setup_config.json in it. Also you can modify some hyper-parameters in the json file.

{
 "model_name":"bert-base-uncased",
 "mode":"sequence_classification",
 "do_lower_case":true,
 "num_labels":"2",
 "save_mode":"pretrained",
 "max_length":"150",
 "captum_explanation":true,
 "embedding_name": "bert",
 "FasterTransformer":false
}

roar090 commented 2 years ago

谢谢！添加setup_config.json打包后模型在torchserve跑起来了。

另外，能否提供一个torchserve部署后，请求调用的参数示例？

POST　http://localhost:3000/predictions/BERTForNER { "data":{ "body":"姚明，男，汉族，1980年9月12日出生于上海市徐汇区。" } }

`- --- Logging error ---

Traceback (most recent call last):
File "/usr/lib/python3.6/logging/init.py", line 996, in emit
stream.write(msg)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 34-41: ordinal not in range(128)
Call stack:
File "/usr/local/lib/python3.6/dist-packages/ts/model_service_worker.py", line 183, in
worker.run_server()
File "/usr/local/lib/python3.6/dist-packages/ts/model_service_worker.py", line 155, in run_server
self.handle_connection(cl_socket)
Invoking custom service failed.
File "/usr/local/lib/python3.6/dist-packages/ts/model_service_worker.py", line 114, in handle_connection
Traceback (most recent call last):
resp = service.predict(msg)
File "/usr/local/lib/python3.6/dist-packages/ts/service.py", line 100, in predict
File "/usr/local/lib/python3.6/dist-packages/ts/service.py", line 100, in predict
ret = self._entry_point(input_batch, self.context)
ret = self._entry_point(input_batch, self.context) .wlm.WorkerThread - Backend response time: 5
File "/usr/local/lib/python3.6/dist-packages/ts/torch_handler/base_handler.py", line 197, in handle
File "/usr/local/lib/python3.6/dist-packages/ts/torch_handler/base_handler.py", line 197, in handle
data_preprocess = self.preprocess(data)
data_preprocess = self.preprocess(data)
File "/home/model-server/tmp/models/de1f2c6c5bf9446f8d2d2668d4fbb68a/handler_ner.py", line 119, in preprocess
File "/home/model-server/tmp/models/de1f2c6c5bf9446f8d2d2668d4fbb68a/handler_ner.py", line 124, in preprocess
logger.info(f"Received text: {input_text}")
Message: "Received text: {'data': {'body': '\u59da\u660e\uff0c\u7537\uff0c\u6c49\u65cf\uff0c1980\u5e749\u670812\u
inputids=torch.tensor([['inputids'] for in total_inputs]).to(self.device),
Arguments: ()
File "/home/model-server/tmp/models/de1f2c6c5bf9446f8d2d2668d4fbb68a/handler_ner.py", line 124, in
inputids=torch.tensor([['inputids'] for in total_inputs]).to(self.device),`

CheaSim commented 2 years ago

API is determined by the handler_ner.py and handler_ner.py code. By default, the input json should be

{
"input_ids":  List  # shape (128)
"attetnion_mask":  List # shape (128)
"token_type_ids": List # shape (128)
}

roar090 commented 2 years ago

非常感谢！ it worked！！另外还有两个地方有些疑惑：

1.请求参数要如何通过要预测的文本转化得到？

“姚明，男，汉族，1980年9月12日出生于上海市徐汇区。” -----> {"input_ids": List # shape (128)，"attetnion_mask": List # shape (128)"token_type_ids": List # shape (128)}

2.torchserve返回的数据结果是什么样的对应关系？

{ "outputs": { "predicate_probabilities": [[ 0.608515739440918, 0.6202450394630432, 0.7023037672042847, -0.030435102060437202, -0.10990876704454422, ]] , "token_label_predictions": [ [ [ -2.197793960571289, -0.5554990768432617, -0.7127630710601807, -1.0187653303146362, -1.2688184976577759, 1.81879723072052, 9.981047630310059, -1.8538830280303955

] ] ] }

predicate_probabilities 部分是 list [ list [ ] ] 结构， token_label_predictions 部分是 list [ list [ list [] ] ] 结构，如何对应到实体和关系上面？

谢谢！✍

CheaSim commented 2 years ago

1. tokenize

Use the code below

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-chinese")
tokenizer("姚明，男，汉族，1980年9月12日出生于上海市徐汇区。")

2. read the OpenUE paper https://aclanthology.org/2020.emnlp-demos.1/

NER model will output the BIEOS ids, and SEQ model will output relation ids. predicate_probabilities is relation ids logits and token_label_predictions is the BIEOS ids for each tokens including [CLS] and [SEP].

So if you want to use the end2end relation extraction, you first need to get the relation types in the sentence, and then pad the relation ids to the sentence and use the sentence with relation ids to get the NER result. Finally, combine the entities in NER result and relation types in the RE model's output.

zjunlp / OpenUE

找不到setup_config.json #16

1. tokenize

2. read the OpenUE paper https://aclanthology.org/2020.emnlp-demos.1/