micrometer-metrics / micrometer

An application observability facade for the most popular observability tools. Think SLF4J, but for observability.
https://micrometer.io
Apache License 2.0
4.46k stars 986 forks source link

Lots of 'ConcurrentModificationException: null' since upgrading to 1.5.3 #2218

Closed sguillope closed 4 years ago

sguillope commented 4 years ago

Since upgrading micrometer to 1.5.3, we're seeing load of these exceptions in our logs with warning message Error while discarding collection, stopping. This has already been mentioned in https://github.com/micrometer-metrics/micrometer/issues/2140#issuecomment-663004911 but figured I'd create a separate ticket for it.

Environment:

java.util.ConcurrentModificationException: null
    at java.base/java.util.ArrayList$Itr.checkForComodification(ArrayList.java:1042)
    at java.base/java.util.ArrayList$Itr.next(ArrayList.java:996)
    at io.micrometer.shaded.reactor.core.publisher.Operators.onDiscardMultiple(Operators.java:565)
    at io.micrometer.shaded.reactor.core.publisher.FluxBufferPredicate$BufferPredicateSubscriber.cancel(FluxBufferPredicate.java:174)
    at io.micrometer.shaded.reactor.core.publisher.FluxMap$MapSubscriber.cancel(FluxMap.java:160)
    at io.micrometer.shaded.reactor.core.publisher.FluxOnBackpressureLatest$LatestSubscriber.cancel(FluxOnBackpressureLatest.java:99)
    at io.micrometer.shaded.reactor.core.publisher.FluxMap$MapSubscriber.cancel(FluxMap.java:160)
    at io.micrometer.shaded.reactor.core.publisher.Operators.terminate(Operators.java:1222)
    at io.micrometer.shaded.reactor.netty.channel.MonoSendMany$SendManyInner.operationComplete(MonoSendMany.java:264)
    at io.micrometer.shaded.reactor.netty.channel.MonoSendMany$SendManyInner.operationComplete(MonoSendMany.java:114)
    at io.micrometer.shaded.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
    at io.micrometer.shaded.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:551)
    at io.micrometer.shaded.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490)
    at io.micrometer.shaded.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
    at io.micrometer.shaded.io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:604)
    at io.micrometer.shaded.io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
    at io.micrometer.shaded.io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:84)
    at io.micrometer.shaded.io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel.java:1158)
    at io.micrometer.shaded.io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:760)
    at io.micrometer.shaded.io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:736)
    at io.micrometer.shaded.io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:607)
    at io.micrometer.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.close(DefaultChannelPipeline.java:1352)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:622)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:606)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:472)
    at io.micrometer.shaded.io.netty.channel.DefaultChannelPipeline.close(DefaultChannelPipeline.java:957)
    at io.micrometer.shaded.io.netty.channel.AbstractChannel.close(AbstractChannel.java:232)
    at io.micrometer.shaded.reactor.netty.DisposableChannel.dispose(DisposableChannel.java:76)
    at io.micrometer.shaded.reactor.netty.channel.ChannelOperations.dispose(ChannelOperations.java:168)
    at io.micrometer.shaded.reactor.core.Disposables.set(Disposables.java:421)
    at io.micrometer.shaded.reactor.core.Disposables$SwapDisposable.update(Disposables.java:317)
    at io.micrometer.statsd.StatsdMeterRegistry.lambda$retryReplaceClient$12(StatsdMeterRegistry.java:273)
    at io.micrometer.shaded.reactor.core.publisher.LambdaMonoSubscriber.onNext(LambdaMonoSubscriber.java:168)
    at io.micrometer.shaded.reactor.core.publisher.SerializedSubscriber.onNext(SerializedSubscriber.java:99)
    at io.micrometer.shaded.reactor.core.publisher.FluxRetryWhen$RetryWhenMainSubscriber.onNext(FluxRetryWhen.java:162)
    at io.micrometer.shaded.reactor.core.publisher.MonoCreate$DefaultMonoSink.success(MonoCreate.java:156)
    at io.micrometer.shaded.reactor.netty.resources.NewConnectionProvider$NewConnectionObserver.onStateChange(NewConnectionProvider.java:199)
    at io.micrometer.shaded.reactor.netty.channel.ChannelOperationsHandler.channelActive(ChannelOperationsHandler.java:65)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:230)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:216)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(AbstractChannelHandlerContext.java:209)
    at io.micrometer.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelActive(DefaultChannelPipeline.java:1398)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:230)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:216)
    at io.micrometer.shaded.io.netty.channel.DefaultChannelPipeline.fireChannelActive(DefaultChannelPipeline.java:895)
    at io.micrometer.shaded.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:305)
    at io.micrometer.shaded.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:249)
    at io.micrometer.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1342)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:548)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:533)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:517)
    at io.micrometer.shaded.io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:978)
    at io.micrometer.shaded.io.netty.channel.AbstractChannel.connect(AbstractChannel.java:253)
    at io.micrometer.shaded.io.netty.bootstrap.Bootstrap$3.run(Bootstrap.java:250)
    at io.micrometer.shaded.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
    at io.micrometer.shaded.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
    at io.micrometer.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
    at io.micrometer.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
    at io.micrometer.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at io.micrometer.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:834)
garry-mcfly commented 4 years ago

I've seen the same behaviour in our scenario. Even worse it seems that it runs into an endless loop afterwards consuming a lot of cpu time and stop reporting any metric values.

Except from stack trace:

    at java.lang.Object.hashCode(Native Method)
    at java.util.HashMap.hash(HashMap.java:339)
    at java.util.HashMap.put(HashMap.java:612)
    at java.util.HashSet.add(HashSet.java:220)
    at sun.nio.ch.EPollSelectorImpl.implRegister(EPollSelectorImpl.java:179)
    at sun.nio.ch.SelectorImpl.register(SelectorImpl.java:132)
    - locked <0x00000000b6316cf0> (a java.util.Collections$UnmodifiableSet)
    at java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:219)
    - locked <0x0000000093399aa0> (a java.lang.Object)
    - locked <0x0000000093399ab0> (a java.lang.Object)
    at io.micrometer.shaded.io.netty.channel.nio.AbstractNioChannel.doRegister(AbstractNioChannel.java:380)
    at io.micrometer.shaded.io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:496)
    at io.micrometer.shaded.io.netty.channel.AbstractChannel$AbstractUnsafe.register(AbstractChannel.java:468)
    at io.micrometer.shaded.io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:87)
    at io.micrometer.shaded.io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:81)
    at io.micrometer.shaded.reactor.netty.resources.ColocatedEventLoopGroup.register(ColocatedEventLoopGroup.java:71)
    at io.micrometer.shaded.io.netty.bootstrap.AbstractBootstrap.initAndRegister(AbstractBootstrap.java:323)
    at io.micrometer.shaded.io.netty.bootstrap.Bootstrap.doResolveAndConnect(Bootstrap.java:155)
    at io.micrometer.shaded.io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:116)
    at io.micrometer.shaded.reactor.netty.resources.NewConnectionProvider.lambda$acquire$0(NewConnectionProvider.java:73)
    at io.micrometer.shaded.reactor.netty.resources.NewConnectionProvider$$Lambda$430/545707113.accept(Unknown Source)
    at io.micrometer.shaded.reactor.core.publisher.MonoCreate.subscribe(MonoCreate.java:57)
    at io.micrometer.shaded.reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52)
    at io.micrometer.shaded.reactor.core.publisher.FluxRetryWhen.subscribe(FluxRetryWhen.java:79)
    at io.micrometer.shaded.reactor.core.publisher.MonoRetryWhen.subscribeOrReturn(MonoRetryWhen.java:46)
    at io.micrometer.shaded.reactor.core.publisher.Mono.subscribe(Mono.java:4198)
    at io.micrometer.shaded.reactor.core.publisher.Mono.subscribeWith(Mono.java:4324)
    at io.micrometer.shaded.reactor.core.publisher.Mono.subscribe(Mono.java:4184)
    at io.micrometer.shaded.reactor.core.publisher.Mono.subscribe(Mono.java:4120)
    at io.micrometer.shaded.reactor.core.publisher.Mono.subscribe(Mono.java:4067)
    at io.micrometer.statsd.StatsdMeterRegistry.retryReplaceClient(StatsdMeterRegistry.java:272)
    at io.micrometer.statsd.StatsdMeterRegistry.connectAndSubscribe(StatsdMeterRegistry.java:261)
    at io.micrometer.statsd.StatsdMeterRegistry.lambda$prepareUdpClient$7(StatsdMeterRegistry.java:233)
    at io.micrometer.statsd.StatsdMeterRegistry$$Lambda$402/1828787392.accept(Unknown Source)
    at io.micrometer.shaded.reactor.netty.udp.UdpClientDoOn.lambda$onStateChange$1(UdpClientDoOn.java:71)
    at io.micrometer.shaded.reactor.netty.udp.UdpClientDoOn$$Lambda$1607/1391024869.dispose(Unknown Source)
    at io.micrometer.shaded.reactor.netty.DisposableChannel$$Lambda$1609/245151462.run(Unknown Source)
    at io.micrometer.shaded.reactor.core.publisher.LambdaMonoSubscriber.onComplete(LambdaMonoSubscriber.java:132)
    at io.micrometer.shaded.reactor.core.publisher.Operators.complete(Operators.java:135)
    at io.micrometer.shaded.reactor.core.publisher.MonoEmpty.subscribe(MonoEmpty.java:45)
    at io.micrometer.shaded.reactor.core.publisher.Mono.subscribe(Mono.java:4213)
    at io.micrometer.shaded.reactor.core.publisher.Mono.subscribeWith(Mono.java:4324)
    at io.micrometer.shaded.reactor.core.publisher.Mono.subscribe(Mono.java:4184)
    at io.micrometer.shaded.reactor.core.publisher.Mono.subscribe(Mono.java:4120)
    at io.micrometer.shaded.reactor.netty.DisposableChannel.onDispose(DisposableChannel.java:134)
    at io.micrometer.shaded.reactor.netty.Connection.onDispose(Connection.java:268)
    at io.micrometer.shaded.reactor.netty.channel.ChannelOperations.onDispose(ChannelOperations.java:197)
    at io.micrometer.shaded.reactor.netty.udp.UdpClientDoOn.onStateChange(UdpClientDoOn.java:71)
    at io.micrometer.shaded.reactor.netty.ReactorNetty$CompositeConnectionObserver.onStateChange(ReactorNetty.java:514)
    at io.micrometer.shaded.reactor.netty.resources.NewConnectionProvider$NewConnectionObserver.onStateChange(NewConnectionProvider.java:207)
    at io.micrometer.shaded.reactor.netty.channel.ChannelOperations.terminate(ChannelOperations.java:432)
    at io.micrometer.shaded.reactor.netty.channel.ChannelOperations.onInboundClose(ChannelOperations.java:390)
    at io.micrometer.shaded.reactor.netty.channel.ChannelOperationsHandler.channelInactive(ChannelOperationsHandler.java:76)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
    at io.micrometer.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
    at io.micrometer.shaded.io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901)
    at io.micrometer.shaded.io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:818)
    at io.micrometer.shaded.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
    at io.micrometer.shaded.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
    at io.micrometer.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
    at io.micrometer.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
    at io.micrometer.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at io.micrometer.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.lang.Thread.run(Thread.java:748)
kdamevin commented 4 years ago

We have been noticing the same issue and it seems highly related to some high memory usage/memory leaks on our side.

I am not entirely sure in which condition it happens but I compared two similar instances at the same time. One was using above 6gb of memory before It died and I could see this Error while discarding collection / ConcurrentModificationException error more than 20k times in the logs, another instance that seemed to behave ok was using about half the memory and this error didn't happen.

nikitwaghelasonos commented 4 years ago

I see this when running a service in docker as well. it was functioning fine until https://github.com/micrometer-metrics/micrometer/issues/2140 showed up on 1.5.1, Upgrading to 1.5.3 started showing lot of cme's. Is there a timeline when this will get fixed and in which version? Java8 micrometer 1.5.3 using statsd Spring 2.3.0.RELEASE

renannprado commented 4 years ago

The reproducer seems to be got lost inside of the other issue's conversation, so I'm pasting the link here again.

This was supposed to reproduce the memory leak, but I bet can reproduce this one too. Maybe just by updating the dependencies.

https://github.com/cicilthomas/micrometer

shakuzen commented 4 years ago

Could anyone experiencing this issue try with the latest 1.5.4-SNAPSHOT version to see if the fix for #2177 causes this issue to also not happen anymore? I haven't been able to reproduce this issue on my machine so far.

garry-mcfly commented 4 years ago

For us this ConcurrentModificationException occurred together with #2177 (lot's of "Error while discarding collection" and then often no metrics where reported). After using the 1.5.4-SNAPSHOT this message also no longer appears. See the other https://github.com/micrometer-metrics/micrometer/issues/2177#issuecomment-671793961 But it would be nice if we are not the only one telling that the bug seems to be fixed now.

shakuzen commented 4 years ago

I will close this with @garry-mcfly's report that it's no longer occurring. We are about to release 1.5.4. If users are still seeing this issue after upgrading, we can reopen this and investigate further.

kdamevin commented 4 years ago

I can confirm we don’t see that error anymore after bumping to 1.5.4. Thank you!