pytorch / serve

Serve, optimize and scale PyTorch models in production
https://pytorch.org/serve/
Apache License 2.0
4.23k stars 863 forks source link

Logging using JsonTemplateLayout fails with `Console contains an invalid element or attribute "JsonTemplateLayout"` error #2359

Open feeeper opened 1 year ago

feeeper commented 1 year ago

🐛 Describe the bug

I've configured log4j to log using JsonTemplateLayout. Below the part of the original log4j.xml:

<Console name="STDOUT" target="SYSTEM_OUT">
    <!--<PatternLayout pattern="%d{ISO8601} [%-5p] %t %c - %m%n"/>-->
    <JsonTemplateLayout eventTemplateUri="classpath:EcsLayout.json"/>
</Console>

Then I start torchserve as usual:

torchserve \
--start \
--model-store model_store \
--models doc_model=doc_model.mar \
--ncs \
--ts-config ./src/ts.config \
--log-config ./src/log4j2.xml

And I got the error message

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2023-05-23 18:52:00,570 main ERROR Console contains an invalid element or attribute "JsonTemplateLayout"

Error logs

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. 2023-05-23 18:52:00,570 main ERROR Console contains an invalid element or attribute "JsonTemplateLayout"

Installation instructions

Install torchserve from source: No Are you using Docker: No

TorchServe was installed using pip

Model Packaing

The model sentence-transformers/paraphrase-multilingual-mpnet-base-v2 from HuggingFace Hub

torch-model-archiver \
--model-name doc_model \
--version 1.382 \
--serialized-file model/pytorch_model.bin \
--handler ./src/transformers_vectorizer_handler.py \
--extra-files "./model/config.json,./tokenizer,./src/log4j2.xml,./src/config.properties"

config.properties

vmargs=-Dlog4j.configurationFile=./log4j2.xml

Versions

$ torchserve -v
> TorchServe Version is 0.6.0

Repro instructions

  1. Download default log4j2 config
  2. Replace PatternLayout node in the Console[name=STDOUT] node in the log4j2.xml with <JsonTemplateLayout eventTemplateUri="classpath:EcsLayout.json"/>:
    <Console name="STDOUT" target="SYSTEM_OUT">
    <!--<PatternLayout pattern="%d{ISO8601} [%-5p] %t %c - %m%n"/>-->
    <JsonTemplateLayout eventTemplateUri="classpath:EcsLayout.json"/>
    </Console>
  3. Download model and tokenizer from HuggingFace:
    from transformers import AutoTokenizer, AutoModel
    checkpoint = 'sentence-transformers/paraphrase-multilingual-mpnet-base-v2'
    tokenizer = AutoTokenizer.from_pretrained(checkpoint )
    model = AutoModel.from_pretrained(checkpoint )
    model.save_pretrained('./model')
    tokenizer.save_pretrained('./tokenizer')
  4. Create ts.config file:
    models={\
    "doc_model": {\
    "1.0": {\
        "defaultVersion": true,\
        "minWorkers": 1,\
        "maxWorkers": 1,\
        "batchSize": 1\
    }\
    }\
    }
    inference_address=http://0.0.0.0:8080
    management_address=http://0.0.0.0:8081
    metrics_address=http://0.0.0.0:8082
    number_of_netty_threads=32
    job_queue_size=1000
  5. Archive model (transformers_vectorizer_handler.py could be any, example below):
    torch-model-archiver \
    --model-name doc_model \
    --version 1.382 \
    --serialized-file model/pytorch_model.bin \
    --handler ./transformers_vectorizer_handler.py \
    --extra-files "./model/config.json,./tokenizer,./log4j2.xml,./config.properties" \
  6. Start TorchServe:
    torchserve \
    --start \
    --model-store model_store \
    --models doc_model=doc_model.mar \
    --ncs \
    --ts-config ./ts.config \
    --log-config ./log4j2.xml

The error below is an actual result:

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2023-05-23 18:52:00,570 main ERROR Console contains an invalid element or attribute "JsonTemplateLayout"

The expected result is log should be in JSON format supported by ELK-stack

transformers_vectorizer_handler.py

from abc import ABC
from dataclasses import dataclass
import json
import logging
from transformers import AutoModel, AutoTokenizer
from ts.torch_handler.base_handler import BaseHandler

logger = logging.getLogger(__name__)

class TransformersVectorizerHandler(BaseHandler, ABC):
    def __init__(self):
        super(TransformersVectorizerHandler, self).__init__()
        self.initialized = False

    def initialize(self, ctx):
        self.manifest = ctx.manifest

        # Read model serialize/pt file
        model_dir = './'
        tokenizer_dir = './'

        self.model = AutoModel.from_pretrained(model_dir)
        self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_dir)

        logger.debug('Transformer model from path {0} loaded successfully'.format(model_dir))

        self.initialized = True

    def preprocess(self, data):
        """ Very basic preprocessing code - only tokenizes.
            Extend with your own preprocessing steps as needed.
        """
        text = data[0].get('inputs')
        if text is None:
            text = data[0].get('body').get('inputs')[0]
        logger.info(f'text == {text}')
        sentences = text
        logger.info('Received text: "%s"', sentences)

        inputs = self.tokenizer(
            sentences,
            padding=True,
            truncation=True,
            return_tensors='pt'
        )
        return inputs

    def inference(self, inputs):
        """
        Predict the class of a text using a trained transformer model.
        """
        # NOTE: This makes the assumption that your model expects text to be tokenized
        prediction = self.model(**inputs)
        logger.info('Model predicted: "%s"', prediction)

        return prediction.pooler_output.tolist()

    def postprocess(self, inference_output):
        # TODO: Add any needed post-processing of the model predictions here
        return inference_output

_service = TransformersVectorizerHandler()

def handle(data, context):
    if not _service.initialized:
        _service.initialize(context)

    if data is None:
        return None

    data = _service.preprocess(data)
    data = _service.inference(data)
    data = _service.postprocess(data)

    return data

Possible Solution

No response

feeeper commented 1 year ago

I've found the page about logging metrics in JSON format and this works fine for metrics (model_metrics and ts_metrics) but doesn't work for access_log: all lines in access_log.log file look like:

org.apache.logging.log4j.core.impl.MutableLogEvent@1ac5a7ff

It seems like JSONPatternLayout does not work for logging access_log? So how I can log access_logs in JSON format?

PS: log4j2.xml section for access_log:

<RollingFile
        name="access_log"
        fileName="${env:LOG_LOCATION:-logs}/access_log.log"
        filePattern="${env:LOG_LOCATION:-logs}/access_log.%d{dd-MMM}.log.gz">
        <!-- <PatternLayout pattern="%d{ISO8601} - %m%n"/> -->
        <JSONPatternLayout/>
    <Policies>
        <SizeBasedTriggeringPolicy size="100 MB"/>
        <TimeBasedTriggeringPolicy/>
    </Policies>
    <DefaultRolloverStrategy max="5"/>
</RollingFile>
flibustier7seas commented 2 months ago

Any updates?