twitter / finagle

A fault tolerant, protocol-agnostic RPC system
https://twitter.github.io/finagle
Apache License 2.0
8.79k stars 1.46k forks source link

Netty warning when consuming TLS transport Finagle server with HTTP #700

Open jdreyesp opened 6 years ago

jdreyesp commented 6 years ago

Finagle version: 18.6.0

Netty creates a log with level 'warn' when trying to consume a TLS server with HTTP. This can't be disabled and generates a lot of info that can potentially flood application logs in some cases.

Expected behavior

This should be handled by Finagle since the service is configured using TLS, or at least configurable when creating a Finagle server with TLS transport.

Actual behavior

When a TLS server is called using HTTP, netty logs:

Jun 19, 2018 3:01:43 PM com.twitter.finagle.netty4.channel.ChannelStatsHandler exceptionCaught
WARNING: ChannelStatsHandler caught an exception
io.netty.handler.codec.DecoderException: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: ...
    at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:459)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at com.twitter.finagle.util.BlockingTimeTrackingThreadFactory$$anon$1.run(BlockingTimeTrackingThreadFactory.scala:23)
    at java.lang.Thread.run(Thread.java:748)
Caused by: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: ...
    at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1106)
    at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1162)
    at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
    at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)
    ... 18 more

Steps to reproduce the behavior

  1. Create a TLS server:
    
    val httpService: Service[Request, Response] = ...
    val sslContext = ...

val server = Http.Server().withTransport.tls(sslContext)

server.serve("localhost:8080", httpService)


2. Curl it with http: `curl http://localhost:8080`
mosesn commented 6 years ago

@jdreyesp from the error NotSslRecordException it smells like curl is sending a non-TLS HTTP message. Can you try changing your scheme in your curl to be https://localhost:8080?

jdreyesp commented 6 years ago

Hi @mosesn, that's exactly the cause of the issue, if you consume the server through HTTP instead of HTTPS, a big warning log is produced and it cannot be disabled / configured.

I've been investigating and this is managed by ChannelStatsHandler. It seems like by default, it logs everything that is not a IOException/TimeoutException/Failure to WARNING level:

override def exceptionCaught(ctx: ChannelHandlerContext, cause: Throwable): Unit = {
...
case _: IOException => Level.FINE
        case _: TimeoutException => Level.FINE
        case f: Failure => f.logLevel
        case _ => Level.WARNING
...

Is there a way to override the ChannelPipeline (managed by netty4) in the server creation to set a custom ChannelStatHandler to avoid this warning?

Thanks in advance!

mosesn commented 6 years ago

@jdreyesp I'm not sure I understand why you want to suppress this warning, which seems useful to me. You can't consume the server without TLS, so I don't know why your clients are trying so hard to do it. Can you elaborate on your use case?

jdreyesp commented 6 years ago

Hi @mosesn , thanks again for your response. We work as a framework team and we're making a non backward compatible change, updating this Finagle server from non secure to secure. Since this server is being consumed by polling tools (Prometheus in our case) we're expecting a big amount of calls during this re-configuration period. That's why we want to be able to disable this warning, in order not to flood out our logs.

I was just wondering if there's an easy way to toggle this warnings off for our server.

Thanks

mosesn commented 6 years ago

I see. You could reduce the log level for this specific component until everything migrates over in your logging configuration (like logback, or jul), but I think the better solution might be to serve traffic for both HTTP and HTTPS, and then cut the HTTP port later. This may simplify your migration story too, since clients won't need to figure out how to speak TLS immediately.

jdreyesp commented 6 years ago

Hi @mosesn !

This is not only a migration thing, but in general attackers can try to flood server logs with plain HTTP requests (even if the server is using TLS configuration).

This could create performance issues, for example.

Solving this as disabling finagle logging is not a possibility, since it's logged by com.twitter.finagle and disabling this means disabling the whole finagle logger.

Thanks

mosesn commented 6 years ago

@jdreyesp shouldn't it be logged under com.twitter.finagle.netty4.channel.ChannelStatsHandler? that's the name we create the logger under in ChannelStatsHandler. https://github.com/twitter/finagle/blob/develop/finagle-netty4/src/main/scala/com/twitter/finagle/netty4/channel/ChannelStatsHandler.scala#L16

jdreyesp commented 6 years ago

Hi @mosesn , that was my first try, but it didn't work. Apparently some class in a higher package level is still throwing this warning.

jdreyesp commented 6 years ago

Hi @mosesn, any updates on this? Thanks

kevinoliver commented 6 years ago

@jdreyesp did you find out if another location is also logging this message? I only see it happening from this logger.

One possible gotcha is that the classname for objects has a trailing dollar sign — "com.twitter.finagle.netty4.channel.ChannelStatsHandler$".

jdreyesp commented 6 years ago

Hi!

No, I found it only in the location you mention, but this is also caught in other locations

Indeed, that avoid the ChannelStatsHandler one (thanks :)), but there's two more, netty related:

08:53:35.932 [finagle/netty4-1] WARN  com.twitter.finagle - Unhandled exception in connection with /127.0.0.1:58886, shutting down connection | io.netty.handler.codec.DecoderException: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: ....

08:53:35.932 [finagle/netty4-1] WARN  i.n.channel.DefaultChannelPipeline - An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception. | io.netty.handler.codec.DecoderException: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: ...

I don't want to get rid of com.twitter.finagle nor DefaultChannelPipeline since I see them important to log in other cases. But I don't want to get my log overflooded by HTTP calls.

kevinoliver commented 6 years ago

Yeah, I see. And I think your more general point stands — we probably should not be logging at warning level for bad user input.

jdreyesp commented 6 years ago

Fair enough.

Please let me know what is your final decision on this in next Finagle version!

Thanks for the support!

kevinoliver commented 6 years ago

Fwiw, I think we're like some help to fix this as it isn't an issue right now for us. Any chance you'd be interested in starting to work on it? I'm sure we can find you help if you get the ball rolling with a PR.

spockz commented 6 years ago

In addition to making this an info or lower message also consider suppressing the stack trace. It has no added value in this message.

ryanoneill commented 6 years ago

I had a branch which addressed this specific exception a few months back, but decided against shipping it at the time, as I wasn't sure that where I was dealing with the exception was appropriate in all cases. I'll probably take another stab at this as time allows. At the very least we should be able to suppress the stack trace, as @spockz suggests.

alexanderkjeldaas commented 4 years ago

I have a pretty bad performance issue that seems to be related to this.

Subject alternative name missing in SSL:

020-06-11 12:41:08,508 WARN [finagle/netty4-2-7] i.n.c.DefaultChannelPipeline [DefaultChannelPipeline.java:1164] An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: General OpenSslEngine problem
Caused by: javax.net.ssl.SSLHandshakeException: General OpenSslEngine problem
Caused by: java.security.cert.CertificateException: No subject alternative DNS name matching foo.bar.no found.
clayreimann commented 4 years ago

@kevinoliver or @spockz any pointers on where I could take a pass at fixing this? I'd either like to suppress the error, or better yet add a handler for this problem. We see this sort of error a lot as our internal security folks scan for hosts that will accept obsolete cipher suites which causes us problems.

spockz commented 4 years ago

For the original issue I would suggest looking at the log lines in com.twitter.finagle.netty4.channel.ChannelStatsHandler to suppress stack traces and/or reduce the log level for these known exceptions caused by "client errors". I don't have a concrete list of known exceptions so this might turn into a game of whack-a-mole.

When doing this it might be worthy to verify these exceptions are still counted in metrics though to troubleshoot connection issues.

clayreimann commented 4 years ago

I don't have a concrete list of known exceptions so this might turn into a game of whack-a-mole.

Thanks, that's very fair.

Is there a way for me to extend the netty pipeline to catch this (or similar) errors? I took a brief look through the docs but didn't see an obvious answer.