pytorch / serve

Serve, optimize and scale PyTorch models in production
https://pytorch.org/serve/
Apache License 2.0
4.19k stars 858 forks source link

507 Server Error: Insufficient Storage for url #2401

Open adeshkin opened 1 year ago

adeshkin commented 1 year ago

πŸ› Describe the bug

507 Server Error: Insufficient Storage for url: http://model-name.models:80/predictions/model-name/ torchserve==0.2.0

Error logs

023-06-07 03:20:23,110 [ERROR] epollEventLoopGroup-3-27 org.pytorch.serve.http.HttpRequestHandler - java.lang.OutOfMemoryError: Direct buffer memory at java.base/java.nio.Bits.reserveMemory(Bits.java:175) at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118) at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:317) at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:755) at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:731) at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:247) at io.netty.buffer.PoolArena.allocate(PoolArena.java:227) at io.netty.buffer.PoolArena.allocate(PoolArena.java:147) at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:356) at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:187) at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:178) at io.netty.buffer.CompositeByteBuf.allocBuffer(CompositeByteBuf.java:1858) at io.netty.buffer.CompositeByteBuf.consolidate0(CompositeByteBuf.java:1737) at io.netty.buffer.CompositeByteBuf.consolidateIfNeeded(CompositeByteBuf.java:564) at io.netty.buffer.CompositeByteBuf.addComponent(CompositeByteBuf.java:266) at io.netty.buffer.CompositeByteBuf.addComponent(CompositeByteBuf.java:222) at io.netty.handler.codec.MessageAggregator.appendPartialContent(MessageAggregator.java:333) at io.netty.handler.codec.MessageAggregator.decode(MessageAggregator.java:298) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436) at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296) at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792) at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$1.run(AbstractEpollChannel.java:387) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:829)

Installation instructions

git clone https://github.com/pytorch/serve cd serve/ && git checkout da9349bf cd .. cp -f Dockerfile serve/docker/ cp -f config.properties serve/docker/ cd serve/docker/ && DOCKER_BUILDKIT=1 docker build --no-cache --file Dockerfile -t 'torchserve-local' .

Model Packaing

https://github.com/pytorch/serve/tree/master/examples/image_segmenter/deeplabv3

config.properties

inference_address=http://0.0.0.0:8080 management_address=http://0.0.0.0:8081 number_of_netty_threads=32 job_queue_size=1000 model_store=/home/model-server/model-store number_of_gpu=1 max_request_size=65535000 max_response_size=65535000

Versions

ARG BASE_IMAGE=nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04

FROM ${BASE_IMAGE} AS compile-image

ENV PYTHONUNBUFFERED TRUE

RUN --mount=type=cache,id=apt-dev,target=/var/cache/apt \ apt-key adv --keyserver keyserver.ubuntu.com --recv-keys A4B469963BF863CC && \ apt-get update && \ DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y \ ca-certificates \ g++ \ python3-dev \ python3-distutils \ python3-venv \ openjdk-11-jre-headless \ curl \ && rm -rf /var/lib/apt/lists/* \ && cd /tmp \ && curl -O https://bootstrap.pypa.io/pip/3.6/get-pip.py \ && python3 get-pip.py RUN apt install zlib1g RUN python3 -m venv /home/venv

ENV PATH="/home/venv/bin:$PATH"

RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 1 RUN update-alternatives --install /usr/local/bin/pip pip /usr/local/bin/pip3 1

# This is only useful for cuda env RUN export USE_CUDA=1

RUN pip install --upgrade pip

RUN pip install --no-cache-dir torch==1.8.2+cu102 torchvision==0.9.2+cu102 -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html --no-cache RUN pip install --no-cache-dir torchserve==0.6.0 torch-model-archiver==0.6.0

RUN pip install --no-cache-dir opencv-python-headless==4.4.0.46 RUN pip install --no-cache-dir pyyaml==5.3.1 configargparse==1.4

# Final image for production FROM ${BASE_IMAGE} AS runtime-image

ENV PYTHONUNBUFFERED TRUE

RUN --mount=type=cache,target=/var/cache/apt \ apt-key adv --keyserver keyserver.ubuntu.com --recv-keys A4B469963BF863CC && \ apt-get update && \ DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y \ python3 \ python3-distutils \ openjdk-11-jre-headless \ && rm -rf /var/lib/apt/lists/* \ && cd /tmp

COPY --from=compile-image /home/venv /home/venv

ENV PATH="/home/venv/bin:$PATH"

RUN useradd -m model-server \ && mkdir -p /home/model-server/tmp

COPY dockerd-entrypoint.sh /usr/local/bin/dockerd-entrypoint.sh

RUN chmod +x /usr/local/bin/dockerd-entrypoint.sh \ && chown -R model-server /home/model-server

COPY config.properties /home/model-server/config.properties RUN mkdir /home/model-server/model-store && chown -R model-server /home/model-server/model-store

EXPOSE 8080 8081

USER model-server WORKDIR /home/model-server ENV TEMP=/home/model-server/tmp ENTRYPOINT ["/usr/local/bin/dockerd-entrypoint.sh"] CMD ["serve"]

Repro instructions

torchserve --start --ts-config /home/model-server/config.properties

Possible Solution

No response

msaroufim commented 1 year ago

torchserve 0.2 is quite old at this point, do you still see this error with newer versions like 0.8?