Open tednaseri opened 2 years ago
Taking a look. Will get back to you
@tednaseri Can you please share how you are pre-processing and running inference on the input data. In another usecase, I have tried sending batch of images (10) as json data and processing them in a single batch and this works. So, I would need more details on your implementation to repro this. For example, it would be great if you can use the HuggingFace transformer example given in the README to modify the custom handler and see if you are able to repro the problem. Thats the example I am going to try.
@agunapal Thank you so much for the response. For an easier communication, I have tried to simplify the custom handler while it repros the problem. For this purpose, I imagine that the input data is just a digit, then the handler makes a dummy input as follows:
handler(input_number):
data = ["sample text" for i in range(input_number)]
model.predict(data)
Using this handler, it still faces the issue. Here is the prepared handler:
from abc import ABC
import logging
import torch
import transformers
from simpletransformers.classification import ClassificationModel
from ts.torch_handler.base_handler import BaseHandler
logger = logging.getLogger(__name__)
logger.info("Transformers version %s", transformers.__version__)
class TransformersCustomHandler(BaseHandler, ABC):
def __init__(self):
super(TransformersCustomHandler, self).__init__()
self.initialized = False
def initialize(self, context):
self.context = context
self.manifest = context.manifest
properties = context.system_properties
self.model_folder = properties.get("model_dir")
if torch.cuda.is_available() and properties.get("gpu_id") is not None:
self.device = torch.device("cuda:" + str(properties.get("gpu_id")))
self.use_cuda = True
else:
self.device = torch.device("cpu")
self.use_cuda = False
self.predictions = []
self.labels = ['no', 'yes']
self.model = self.load_model()
# The following line does not work for simple transformer
# self.model.to(self.device)
# self.model.eval()
self.initialized = True
def load_model(self):
model = ClassificationModel('roberta', self.model_folder, use_cuda=self.use_cuda)
return model
def predict(self, param):
self.predictions = []
count = param[0]["count"].decode("utf-8")
count = int(count)
input_text = "sample text"
data = [input_text for i in range(count)]
preds, out_results = self.model.predict(data)
label_lst = [self.labels[i] for i in preds]
for i in range(len(label_lst)):
prediction = {"label": label_lst[i]}
self.predictions.append(prediction)
def get_predictions(self):
return self.predictions
# _service = TransformersCustomHandler()
def handle(self, data, context):
try:
# if not _service.initialized:
# _service.initialize(context)
#
# if data is None:
# return None
self.predict(data)
result = [self.get_predictions()]
return result
except Exception as e:
raise e
@tednaseri I used the below handler and I tried with json payloads of length 1000 with T4 GPU. It works
`from abc import ABC import logging import torch import transformers from transformers import ( AutoModelForSequenceClassification, AutoTokenizer, )
from ts.torch_handler.base_handler import BaseHandler
logger = logging.getLogger(name) logger.info("Transformers version %s", transformers.version)
class TransformersHandler(BaseHandler, ABC): """ Transformers handler class for sequence classification. """
def __init__(self):
super(TransformersHandler, self).__init__()
self.initialized = False
def initialize(self, ctx):
"""In this initialize function, the BERT model is loaded
Args:
ctx (context): It is a JSON Object containing information
pertaining to the model artefacts parameters.
"""
self.manifest = ctx.manifest
properties = ctx.system_properties
model_dir = properties.get("model_dir")
self.device = torch.device(
"cuda:" + str(properties.get("gpu_id"))
if torch.cuda.is_available() and properties.get("gpu_id") is not None
else "cpu"
)
self.model = AutoModelForSequenceClassification.from_pretrained(
model_dir
)
self.model.to(self.device)
self.tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased",
do_lower_case=True
)
self.model.eval()
logger.info("Transformer model from path %s loaded successfully", model_dir)
def preprocess(self, requests):
"""Basic text preprocessing, based on the user's chocie of application mode.
Args:
requests (str): The Input data in the form of text is passed on to the preprocess
function.
Returns:
list : The preprocess function returns a list of Tensor for the size of the word tokens.
"""
inputs = None
for idx, data in enumerate(requests):
input_text = data.get("data") or data.get("body")
input_text = input_text["text"]
inputs = self.tokenizer(input_text, return_tensors="pt")
return inputs
def inference(self, data, *args, **kwargs):
"""
The Inference Function is used to make a prediction call on the given input request.
The user needs to override the inference function to customize it.
Args:
data (Torch Tensor): A Torch Tensor is passed to make the Inference Request.
The shape should match the model input shape.
Returns:
Torch Tensor : The Predicted Torch Tensor is returned in this function.
"""
mask = data['attention_mask'].to(self.device)
input_id = data['input_ids'].squeeze(1).to(self.device)
with torch.no_grad():
results = self.model(input_id, mask)
return results
def postprocess(self, data):
result = data.logits.argmax(dim=1)
result = result.tolist()
return [result]
`
Here is the client part ` import requests import json
api = "http://127.0.0.1:8080/predictions/my_tc" headers = {'Content-type': 'application/json', 'Accept': 'text/plain'}
payload = {"text":["Bloomberg has decided to publish a new report on the global economy." for i in range(1000)]}
payload = json.dumps(payload) response = requests.post(api, data=payload, headers=headers)
print(response.content.decode("UTF-8")) `
@agunapal Thank you so much for the response. Your test shows that the input sample of 1000 works.
By the way, there are some differences that I cannot get that much from the response.
Maybe I need to switch to fastAPI and manual serving.
Hi @agunapal, I have tested another transformer model with the same custom handler, there is no issue there. I think that the issue is an incompatibility between SimpleTransformer and PyTorch. I am wondering, have you ever tested any SimpleTransformer model with PyTorch for text classification?
@tednaseri I am not sure if this has been tested. If you think SimpleTransformers add value and want to create an example showing the integration, please feel free to create a PR and get feedback.
@tednaseri I am dealing the same issue right now. Can you share with me what was the solution?
🐛 Describe the bug
I am passing JSON data to python-requests. For simplification you can assume the following input:
Issue: When the
count<= 8
--> it is working well. As soon ascount>8
--> it stuck and never returns.as you see the input is just a simple python dictionary and if I set
input_data = [dic1 for i in range(10)]
the final size of the input would be very small.I am using:
When the issue shows itself: TorchServe: on the GPU, it is critically dependent on the input data size.
When it works well: TorchServe: On the CPU, it is working well regardless of the size of input. PyTorch without TorchServe: I have tested it on PyTorch it is working well even when I pass
input_data = [dic1 for i in range(1000)]
Error logs
Installation instructions
I don't use docker for installation
Model Packaing
config.properties
Versions
Repro instructions
torchserve --start --model-store model-store --models model=hardnews --ncs --ts-config config.properties
running prediction
Possible Solution
No response