vert-x3 / vertx-service-discovery

Some tools one can use for doing microservices with Vert.x
Apache License 2.0
115 stars 67 forks source link

ConsulServiceImporter throwing ConcurrentModificationException #78

Closed marksmithson closed 6 years ago

marksmithson commented 6 years ago

We are seeing this in our logs:

ERROR io.vertx.core.impl.ContextImpl - Unhandled exception - 
java.util.ConcurrentModificationException: null
    at java.util.ArrayList.forEach(ArrayList.java:1252)
    at io.vertx.servicediscovery.consul.ConsulServiceImporter.lambda$retrieveIndividualServices$12(ConsulServiceImporter.java:179)
    at io.vertx.core.impl.CompositeFutureImpl.lambda$all$1(CompositeFutureImpl.java:54)
    at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:126)
    at io.vertx.core.impl.FutureImpl.complete(FutureImpl.java:88)
    at io.vertx.servicediscovery.consul.ConsulServiceImporter.lambda$importService$14(ConsulServiceImporter.java:234)
    at io.vertx.core.impl.CompositeFutureImpl.lambda$all$1(CompositeFutureImpl.java:54)
    at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:126)
    at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:133)
    at io.vertx.core.impl.FutureImpl.complete(FutureImpl.java:95)
    at io.vertx.servicediscovery.consul.ConsulServiceImporter.lambda$importService$13(ConsulServiceImporter.java:223)
    at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:126)
    at io.vertx.core.impl.FutureImpl.complete(FutureImpl.java:88)
    at io.vertx.servicediscovery.consul.ImportedConsulService.lambda$register$0(ImportedConsulService.java:70)
    at io.vertx.servicediscovery.impl.DiscoveryImpl.lambda$publish$6(DiscoveryImpl.java:322)
    at io.vertx.servicediscovery.impl.DefaultServiceDiscoveryBackend.lambda$store$0(DefaultServiceDiscoveryBackend.java:53)
    at io.vertx.core.impl.FutureImpl.setHandler(FutureImpl.java:81)
    at io.vertx.core.impl.ContextImpl.lambda$null$0(ContextImpl.java:287)
    at io.vertx.core.impl.ContextImpl.lambda$wrapTask$2(ContextImpl.java:337)
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:445)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
    at java.lang.Thread.run(Thread.java:745)

It appears that the offending code is removing items from a list it is iterating over:

https://github.com/vert-x3/vertx-service-discovery/blob/master/vertx-service-discovery-bridge-consul/src/main/java/io/vertx/servicediscovery/consul/ConsulServiceImporter.java

imports.forEach(svc -> {
  if (!retrievedIds.contains(svc.id())) {
    LOGGER.info("Unregistering " + svc.id());
    imports.remove(svc);
    svc.unregister(publisher, null);
  }
});

I would have expected this to be caught by the tests.

marksmithson commented 6 years ago

This would appear to fix it.

 List<ImportedConsulService> toRemove = new ArrayList<>();
      imports.forEach(svc -> {
        if (!retrievedIds.contains(svc.id())) {
          LOGGER.info("Unregistering " + svc.id());
          toRemove.add(svc);
          svc.unregister(publisher, null);
        }
      });
      imports.removeAll(toRemove);

However, I can't see how to easily add a test that fails for this. The ConsulServerImporterTest currently logs a failure message, but as it polls for services and the exception is hidden the test will not fail. (one removed service is removed each time the importer is called)

cescoffier commented 6 years ago

@ruslansennov can you have a look?

ruslansennov commented 6 years ago

ok

ruslansennov commented 6 years ago

@cescoffier I share the difficulties @marksmithson is talking about. See the problem code. Here we must separate valid exceptions (like shutdown/restart/maintenance of consul process) from exceptions that caused by code bugs. I don't see any possibility to check this within current API. Maybe we can use own wrapper of slf4j logger, but this looks ugly. The same thoughts are related to docker bridge. I believe we should accept hotfix proposed by @marksmithson without any changes in tests for a while. And maybe postpone testing to another issue

cescoffier commented 6 years ago

The code looks good. I also think the access to the imports list at https://github.com/vert-x3/vertx-service-discovery/blob/master/vertx-service-discovery-bridge-consul/src/main/java/io/vertx/servicediscovery/consul/ConsulServiceImporter.java#L320 should be protected by a lock.

ruslansennov commented 6 years ago

@marksmithson could you provide a PR?

marksmithson commented 6 years ago

here you go https://github.com/vert-x3/vertx-service-discovery/pull/80

marksmithson commented 6 years ago

I have added the synchronization on line 320.

I am unclear about the purpose of this and if the code actually achieves it. I assume this is to make sure the imports List contains a valid list of the services which have been registered, to prevent imports that are in progress concurrently from interfering with each other?

In this case, is there an issue with importService registering the services here https://github.com/vert-x3/vertx-service-discovery/blob/3385dd68d3772c24a59efcf95207c57a281c0d7e/vertx-service-discovery-bridge-consul/src/main/java/io/vertx/servicediscovery/consul/ConsulServiceImporter.java#L220. The registered service will not be reflected in the imports list until all the registrations have completed.

I can see a scenario where one batch of services from consul have been processed and missing services are being unregistered here https://github.com/vert-x3/vertx-service-discovery/blob/master/vertx-service-discovery-bridge-consul/src/main/java/io/vertx/servicediscovery/consul/ConsulServiceImporter.java#L183. At the same time another import which includes a new record for a deleted service is in progress - This could result in the new service being registered by the second import and then removed by the first import?

ruslansennov commented 6 years ago

I'm not author of this module, but here are my thoughts:

When user calls ConsulServiceImporter.close(Handler), the CompositeFuture with list of deregistering tasks is composed. We don't know in which thread the handler will be called (see L316), because in common case we knows nothing about implementation of ServicePublisher. Thus, imports.clear() can be called in non-eventloop thread unlike L164

Maybe I'm wrong :)

@cescoffier ?

cescoffier commented 6 years ago

The reported issue is fixed by #80 . And yes, the close method can be called from any thread... so synchronization is required.