openzipkin / zipkin

Zipkin is a distributed tracing system
https://zipkin.io/
Apache License 2.0
16.99k stars 3.09k forks source link

zipkin server takes a long time to shutdown #2454

Closed jorgheymans closed 5 years ago

jorgheymans commented 5 years ago

java -jar zipkin.jar takes very long to shutdown (max i have seen is half a minute). Seeing this with the latest version an in-memory storage. A cold unused instance takes about 5 seconds to stop upon CTRL-C. This is on linux using jdk8.

codefromthecrypt commented 5 years ago

cc @openzipkin/armeria in case y'all have insight into this. This started after the switcheroo and maybe there's an easy explanation/setting somewhere

jorgheymans commented 5 years ago

Confirming with java -jar zipkin.jar --logging.level.root=TRACE that the armeriaServer bean indeed is taking a long time to destroy:

2019-03-19 09:14:25.531 TRACE 28697 --- [ Thread-5] o.s.b.f.s.DisposableBeanAdapter : Invoking destroy method 'close' on bean with name 'cassandraCluster' 2019-03-19 09:14:25.555 TRACE 28697 --- [ Thread-5] o.s.b.f.s.DisposableBeanAdapter : Invoking destroy method 'close' on bean with name 'armeriaServer' 2019-03-19 09:14:32.784 TRACE 28697 --- [ Thread-5] o.s.b.f.s.DisposableBeanAdapter : Invoking destroy() on bean with name 'defaultValidator' 2019-03-19 09:14:32.784 TRACE 28697 --- [ Thread-5] o.s.b.f.s.DefaultListableBeanFactory : Retrieved dependent beans for bean 'prometheusMeterRegistry': [metrics, zipkin2.server.internal.MetricsHealthController, metricsEndpoint, zipkin2.autoconfigure.prometheus.ZipkinPrometheusMetricsAutoConfiguration, metricsRestTemplateCustomizer] 2019-03-19 09:14:32.784 TRACE 28697 --- [ Thread-5] o.s.b.f.s.DefaultListableBeanFactory : Retrieved dependent beans for bean 'metrics': [zipkin2.server.internal.ZipkinHttpCollector]

trustin commented 5 years ago

Is graceful shutdown enabled? ~(It's disabled if not configured.)~

anuraaga commented 5 years ago

It looks like graceful shutdown might be enabled by default

https://github.com/line/armeria/blob/master/spring/boot-autoconfigure/src/main/java/com/linecorp/armeria/spring/ArmeriaSettings.java#L190

I think this is because graceful shutdowns is a good default for normal servers, though since tracing is often best effort maybe should be changed in zipkin server

jorgheymans commented 5 years ago

Zipkin does not appear to configure graceful shutdown for the ArmeriaServer, so it would rely on the defaults.

https://github.com/openzipkin/zipkin/blob/65a0a331c9c97f0dfb596433a5f71782430fcf4d/zipkin-server/src/main/java/zipkin2/server/internal/ZipkinServerConfiguration.java#L60

@anuraaga you mean gracefulShutdownQuietPeriodMillis rather ?https://github.com/line/armeria/blob/4412f9984047509423cdad7f6ae78a499a3f7d5e/spring/boot-autoconfigure/src/main/java/com/linecorp/armeria/spring/ArmeriaSettings.java#L181

anuraaga commented 5 years ago

The quiet period is how long the server will wait for requests to go away. The timeout is the maximum amount of time it will wait - with a load balancer configured for the health point, the quiet period will generally be enough for shutdown but without that the server will always wait until the timeout of 40s.

To disable graceful shutdown on zipkin server, we should probably disable both settings - though @adriancole should make the call as enabling graceful by default can also reduce gotchas in default configurations.

codefromthecrypt commented 5 years ago

So before we switched to armeria, we didn't have such a long quiet period. So in this case, I think we should disable the settings by default as folks wouldn't have relied on them formerly and the first time experience isn't great with them on either.

On Thu, Mar 21, 2019 at 10:24 PM Anuraag Agrawal notifications@github.com wrote:

The quiet period is how long the server will wait for requests to go away. The timeout is the maximum amount of time it will wait - with a load balancer configured for the health point, the quiet period will generally be enough for shutdown but without that the server will always wait until the timeout of 40s.

To disable graceful shutdown on zipkin server, we should probably disable both settings - though @adriancole https://github.com/adriancole should make the call as enabling graceful by default can also reduce gotchas in default configurations.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openzipkin/zipkin/issues/2454#issuecomment-475249888, or mute the thread https://github.com/notifications/unsubscribe-auth/AAD616t86Z4SL7lIxeUVXlOMXwqFN_T9ks5vY5YogaJpZM4b7dNG .