pytorch / serve

Serve, optimize and scale PyTorch models in production
https://pytorch.org/serve/
Apache License 2.0
4.18k stars 850 forks source link

Return a list of dicts from prediction #2169

Open kaiogu opened 1 year ago

kaiogu commented 1 year ago

🐛 Describe the bug

I am trying to return a python list of dicts from pytorch serve. It is a list containing the top 5 predictions of the model:

top_k_predictions = [{'label': 'label 1', 'probability': 0.3068893551826477}, {'label': 'label 2': 0.061380647122859955}, {'label': 'label 3', 'probability': 0.056913409382104874}, {'label': 'label 4', 'probability': 0.04706308990716934}, {'label': 'label 5', 'probability': 0.037477653473615646}]

class TransformersClassifierHandler(BaseHandler):
    ...

    def postprocess(self, logits):
    ...
    return top_k_predictions
...

The returned valued get distorted, see below

Error logs

No error logs, but unexpected behaviour.

Installation instructions

I am running the containers locally and followed these instructions: https://cloud.google.com/blog/topics/developers-practitioners/pytorch-google-cloud-how-deploy-pytorch-models-vertex-ai

Model Packaing

https://cloud.google.com/blog/topics/developers-practitioners/pytorch-google-cloud-how-deploy-pytorch-models-vertex-ai

config.properties

No response

Versions

In the running container:

$ torchserve --version
TorchServe Version is 0.7.0

Repro instructions

When I query the model, only the first element of the list is returned:

$ curl -X POST -H "Content-Type: application/json; charset=utf-8" -d @./models/predictor/instances.json http://localhost:7080/predictions/distilbert-base-multilingual-cased
> {"predictions": [{"label": "Computer-Arbeitsplatz", "probability": 0.3068893551826477}]}%

If I try to convert the list of dicts to a string before returning (return json.dumps(top_k_predictions), the result gets distorted even worse:

$ curl -X POST -H "Content-Type: application/json; charset=utf-8" -d @./models/predictor/instances.json http://localhost:7080/predictions/distilbert-base-multilingual-cased
> {"predictions": "["}%

Possible Solution

No response

mreso commented 1 year ago

Hi @kaiogu,

thats actually a bit strange. Can you post your full handler code here? Your first example should actually result in an "number of batch response mismatched" error when triggering [this] guard as len(ret)==5 (https://github.com/pytorch/serve/blob/86d440041b663961c71a6262fe648111d85b27d8/ts/service.py#L141) (assuming you only send one request). Your last example with the json.dumps should trigger this because the return value is not a list. Having the full handler code could help debugging this.

kaiogu commented 1 year ago

Hi @mreso,

thanks for the quick reply. Sure thing:

import json
import logging
import os

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from ts.torch_handler.base_handler import BaseHandler

logger = logging.getLogger(__name__)

class TransformersClassifierHandler(BaseHandler):
    """
    The handler takes an input string and returns the classification text
    based on the serialized transformers checkpoint.
    """

    def __init__(self):
        super(TransformersClassifierHandler, self).__init__()
        self.initialized = False
        self.model = None
        self.mapping = None
        self.device = None
        self.manifest = None
        self.tokenizer = None

    def initialize(self, ctx):
        """Loads the model.pt file and initializes the model object.
        Instantiates Tokenizer for preprocessor to use
        Loads labels to name mapping file for post-processing inference response
        """
        self.manifest = ctx.manifest

        properties = ctx.system_properties
        model_dir = properties.get("model_dir")
        self.device = torch.device(
            "cuda:" + str(properties.get("gpu_id")) if torch.cuda.is_available() else "cpu"
        )

        # Read model serialize/pt file
        serialized_file = self.manifest["model"]["serializedFile"]
        model_pt_path = os.path.join(model_dir, serialized_file)
        if not os.path.isfile(model_pt_path):
            raise RuntimeError("Missing the model.pt or pytorch_model.bin file")

        # Load model
        self.model = AutoModelForSequenceClassification.from_pretrained(model_dir)
        self.model.to(self.device)
        self.model.eval()
        logger.debug(f"Transformer model from path {0} loaded successfully".format(model_dir))

        # Ensure to use the same tokenizer used during training
        self.tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")

        # Read the mapping file, index to object name
        mapping_file_path = os.path.join(model_dir, "index_to_name.json")

        if os.path.isfile(mapping_file_path):
            with open(mapping_file_path, mode="rt", encoding="utf8") as f:
                self.mapping = json.load(f)
        else:
            logger.warning(
                "Missing the index_to_name.json file. Inference output will not include class name."
            )

        self.initialized = True

    def preprocess(self, data):
        """Preprocessing input request by tokenizing
        Extend with your own preprocessing steps as needed
        """
        sentences = data[0].get("data")
        logger.info("Received text: '%s'", sentences)

        # Tokenize the texts
        tokenizer_args = (sentences,)
        inputs = self.tokenizer(
            *tokenizer_args,
            padding="max_length",
            max_length=128,
            truncation=True,
            return_tensors="pt",
        )
        return inputs

    def inference(self, inputs):
        """Predict the class of a text using a trained transformer model."""
        inference_output = self.model(inputs["input_ids"].to(self.device))
        logger.info(f"TYPE OF PREDICTIONS: {inference_output}")
        return inference_output.logits

    def postprocess(self, logits):
        """Placeholder for post-processing the inference output."""
        probabilities = torch.softmax(logits, dim=1)

        top_k = torch.topk(probabilities, k=5)
        top_k_labels = [self.mapping[str(i)] for i in top_k.indices[0].tolist()]
        top_k_probabilities = top_k.values[0].tolist()
        top_k_predictions = [
            {"label": label, "probability": probability}
            for label, probability in zip(top_k_labels, top_k_probabilities)
        ]
        print(f"{top_k_predictions=}")
        return top_k_predictions

PS: Solved by wrapping return list in another list:

return [top_k_predictions]

The error logs you mentioned would have helped though :)