pytorch / serve

Serve, optimize and scale PyTorch models in production
https://pytorch.org/serve/
Apache License 2.0
4.22k stars 861 forks source link

using https causes "Invalid configuration" error at load #2466

Open sreeprasannar opened 1 year ago

sreeprasannar commented 1 year ago

🐛 Describe the bug

This config works:

vmargs=-XX:+UseContainerSupport -XX:InitialRAMPercentage=25.0 -XX:MaxRAMPercentage=100.0 -XX:-UseLargePages -XX:+UseG1GC -XX:+ExitOnOutOfMemoryError
inference_address=http://0.0.0.0:8443
management_address=http://0.0.0.0:8445
metrics_address=http://0.0.0.0:8446
load_models=ALL
metrics_mode=prometheus
job_queue_size=3
metrics_config=/app/config/metrics.yaml
default_workers_per_model=1
private_key_file=/app/ssl_certificates/private_key.txt
certificate_file=/app/ssl_certificates/certificate.pem

Changing the addresses to https causes:

2023-07-14 14:28:01.246Z Invalid configuration: Input byte array has incorrect ending byte at 1844

I tracked down the error to this line

Error logs

2023-07-14 14:27:59.145Z 2023-07-14T14:27:59,144 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2023-07-14 14:28:01.245Z 2023-07-14T14:28:01,245 [INFO ] main org.pytorch.serve.ModelServer - Torchserve stopped.
2023-07-14 14:28:01.246Z Invalid configuration: Input byte array has incorrect ending byte at 1844

Installation instructions

I'm using the Docker image torchserve:0.8.0-gpu

Model Packaing

I don't think this is relevant as config.properties is the differentiator clearly

config.properties

No response

Versions

torchserve:0.8.0-gpu

Repro instructions

Just use https and it should fail. I don't think there are any working examples for https anyway in the repo, so maybe this was never tested.

Possible Solution

No response

sreeprasannar commented 1 year ago

fwiw, I'm pretty sure this line is causing the error. It seems unrelated to https which is the confusing part.

sreeprasannar commented 1 year ago

Ok more info: The previous errors were because my private key was encrypted and the key file parser was looking for the header BEGIND PRIVATE KEY. I decrypted the key and now the loading works as expected, but this happens immediately after load:

2023-07-14T17:06:46,804 [ERROR] epollEventLoopGroup-3-2 org.pytorch.serve.http.HttpRequestHandler -
2023-07-14 17:06:46.806Z io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: no cipher suites in common
2023-07-14 17:06:46.806Z        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:471) ~[model-server.jar:?]
2023-07-14 17:06:46.806Z        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) ~[model-server.jar:?]
2023-07-14 17:06:46.806Z        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [model-server.jar:?]
2023-07-14 17:06:46.806Z        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [model-server.jar:?]
2023-07-14 17:06:46.806Z        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [model-server.jar:?]
2023-07-14 17:06:46.806Z        at io.netty.handler.codec.ByteToMessageDecoder.handlerRemoved(ByteToMessageDecoder.java:253) [model-server.jar:?]
2023-07-14 17:06:46.806Z        at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:508) [model-server.jar:?]
2023-07-14 17:06:46.806Z        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:440) [model-server.jar:?]
2023-07-14 17:06:46.806Z        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) [model-server.jar:?]
2023-07-14 17:06:46.806Z        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [model-server.jar:?]
2023-07-14 17:06:46.806Z        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [model-server.jar:?]
2023-07-14 17:06:46.806Z        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [model-server.jar:?]
2023-07-14 17:06:46.806Z        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [model-server.jar:?]
2023-07-14 17:06:46.806Z        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [model-server.jar:?]
2023-07-14 17:06:46.806Z        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [model-server.jar:?]
2023-07-14 17:06:46.806Z        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [model-server.jar:?]
2023-07-14 17:06:46.806Z        at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) [model-server.jar:?]
2023-07-14 17:06:46.806Z        at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:475) [model-server.jar:?]
2023-07-14 17:06:46.806Z        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) [model-server.jar:?]
2023-07-14 17:06:46.806Z        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [model-server.jar:?]
2023-07-14 17:06:46.806Z        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [model-server.jar:?]
2023-07-14 17:06:46.806Z        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [model-server.jar:?]
2023-07-14 17:06:46.806Z        at java.lang.Thread.run(Thread.java:833) [?:?]
2023-07-14 17:06:46.806Z Caused by: javax.net.ssl.SSLHandshakeException: no cipher suites in common
2023-07-14 17:06:46.806Z        at sun.security.ssl.Alert.createSSLException(Alert.java:131) ~[?:?]
2023-07-14 17:06:46.806Z        at sun.security.ssl.Alert.createSSLException(Alert.java:117) ~[?:?]
2023-07-14 17:06:46.806Z        at sun.security.ssl.TransportContext.fatal(TransportContext.java:358) ~[?:?]
2023-07-14 17:06:46.806Z        at sun.security.ssl.TransportContext.fatal(TransportContext.java:314) ~[?:?]
2023-07-14 17:06:46.806Z        at sun.security.ssl.TransportContext.fatal(TransportContext.java:305) ~[?:?]
2023-07-14 17:06:46.806Z        at sun.security.ssl.ServerHello$T12ServerHelloProducer.chooseCipherSuite(ServerHello.java:471) ~[?:?]
2023-07-14 17:06:46.806Z        at sun.security.ssl.ServerHello$T12ServerHelloProducer.produce(ServerHello.java:297) ~[?:?]
2023-07-14 17:06:46.806Z        at sun.security.ssl.SSLHandshake.produce(SSLHandshake.java:440) ~[?:?]
2023-07-14 17:06:46.806Z        at sun.security.ssl.ClientHello$T12ClientHelloConsumer.consume(ClientHello.java:1109) ~[?:?]
2023-07-14 17:06:46.806Z        at sun.security.ssl.ClientHello$ClientHelloConsumer.onClientHello(ClientHello.java:842) ~[?:?]
2023-07-14 17:06:46.806Z        at sun.security.ssl.ClientHello$ClientHelloConsumer.consume(ClientHello.java:801) ~[?:?]
2023-07-14 17:06:46.806Z        at sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:396) ~[?:?]
2023-07-14 17:06:46.806Z        at sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:480) ~[?:?]
2023-07-14 17:06:46.806Z        at sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(SSLEngineImpl.java:1277) ~[?:?]
2023-07-14 17:06:46.806Z        at sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(SSLEngineImpl.java:1264) ~[?:?]
2023-07-14 17:06:46.806Z        at java.security.AccessController.doPrivileged(AccessController.java:712) ~[?:?]
2023-07-14 17:06:46.806Z        at sun.security.ssl.SSLEngineImpl$DelegatedTask.run(SSLEngineImpl.java:1209) ~[?:?]
2023-07-14 17:06:46.806Z        at io.netty.handler.ssl.SslHandler.runAllDelegatedTasks(SslHandler.java:1557) ~[model-server.jar:?]
2023-07-14 17:06:46.806Z        at io.netty.handler.ssl.SslHandler.runDelegatedTasks(SslHandler.java:1571) ~[model-server.jar:?]
2023-07-14 17:06:46.806Z        at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1455) ~[model-server.jar:?]
2023-07-14 17:06:46.806Z        at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1282) ~[model-server.jar:?]
2023-07-14 17:06:46.806Z        at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1329) ~[model-server.jar:?]
2023-07-14 17:06:46.806Z        at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:501) ~[model-server.jar:?]
2023-07-14 17:06:46.806Z        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:440) ~[model-server.jar:?]
2023-07-14 17:06:46.806Z        ... 22 more
stf976 commented 3 months ago

We encountered the same error message as OP with one of our certificates. Using a key and certificate generated e.g. as described here, https://pytorch.org/serve/configuration.html#enable-ssl , it is possible to use HTTPS without problems. This works with Torchserve running locally, as well as in a docker container. OS is Ubuntu 23.04 (local install and docker image), TS version is 0.11.1.

Edit: please not that the linked instructions create a self-signed certificate, which should not be used for production. However, it can be used as a starting point to investigate what difference in the certificates causes TorchServe to throw an exception.