hseeberger / constructr

Coordinated (etcd, ...) cluster construction for dynamic (cloud, containers) environments
Apache License 2.0
212 stars 37 forks source link

Multiple ActorSystems and ClusterActorRefProvider cause duplicate init of akka-remoting (closes #151) #152

Closed rubendg closed 7 years ago

rubendg commented 7 years ago

This change will most probably break any existing usages where the application configuration is piggybacking on the akka.actor.provider configuration provided here.

hseeberger commented 7 years ago

I'm a bit reluctant ... could you please explain why one would want to use more than one actor system per JVM and how one configures them to use remoting or clustering?

rubendg commented 7 years ago

From this answer (http://stackoverflow.com/questions/10396552/do-i-need-to-re-use-the-same-akka-actorsystem-or-can-i-just-create-one-every-tim) I gather that you would create multiple actor systems for isolation purposes, e.g. conflict free naming of actors although it not being a recommended practice.

In particular this problem cropped up in combination with Kamon which always creates its own actor system with no possibility of supplying one (https://github.com/kamon-io/Kamon/blob/master/kamon-core/src/main/scala/kamon/Kamon.scala#L59).

More specifically:

Kamon.start()
val system = ActorSystem("test", config)

Fails:

[INFO] [02/20/2017 08:51:15.089] [main] [akka.remote.Remoting] Starting remoting
[INFO] [02/20/2017 08:51:15.255] [main] [akka.remote.Remoting] Remoting started; listening on addresses :[akka.tcp://kamon@172.17.0.2:2552]
[INFO] [02/20/2017 08:51:15.256] [main] [akka.remote.Remoting] Remoting now listens on addresses: [akka.tcp://kamon@172.17.0.2:2552]
[INFO] [02/20/2017 08:51:15.270] [main] [akka.cluster.Cluster(akka://kamon)] Cluster Node [akka.tcp://kamon@172.17.0.2:2552] - Starting up...
[INFO] [02/20/2017 08:51:15.306] [main] [akka.cluster.Cluster(akka://kamon)] Cluster Node [akka.tcp://kamon@172.17.0.2:2552] - Registered cluster JMX MBean [akka:type=Cluster]
[INFO] [02/20/2017 08:51:15.306] [main] [akka.cluster.Cluster(akka://kamon)] Cluster Node [akka.tcp://kamon@172.17.0.2:2552] - Started up successfully
[INFO] [02/20/2017 08:51:15.318] [kamon-akka.actor.default-dispatcher-14] [akka.cluster.Cluster(akka://kamon)] Cluster Node [akka.tcp://kamon@172.17.0.2:2552] - Metrics will be retreived from MBeans, and may be incorrect on some platforms. To increase metric accuracy add the 'sigar.jar' to the classpath and the appropriate platform-specific native libary to 'java.library.path'. Reason: java.lang.ClassNotFoundException: org.hyperic.sigar.Sigar
[INFO] [02/20/2017 08:51:15.324] [kamon-akka.actor.default-dispatcher-14] [akka.cluster.Cluster(akka://kamon)] Cluster Node [akka.tcp://kamon@172.17.0.2:2552] - Metrics collection has started successfully
[INFO] [02/20/2017 08:51:15.326] [kamon-akka.actor.default-dispatcher-4] [akka.cluster.Cluster(akka://kamon)] Cluster Node [akka.tcp://kamon@172.17.0.2:2552] - No seed-nodes configured, manual cluster join required
[INFO] [02/20/2017 08:51:15.358] [main] [akka.remote.Remoting] Starting remoting
[ERROR] [02/20/2017 08:51:15.373] [test-akka.remote.default-remote-dispatcher-7] [NettyTransport(akka://test)] failed to bind to /172.17.0.2:2552, shutting down Netty transport
Exception in thread "main" org.jboss.netty.channel.ChannelException: Failed to bind to: /172.17.0.2:2552
    at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
    at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:417)
    at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:413)
    at scala.util.Success$$anonfun$map$1.apply(Try.scala:237)
    at scala.util.Try$.apply(Try.scala:192)
    at scala.util.Success.map(Try.scala:237)
    at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
    at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
    at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
    at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
    at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
    at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
    at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
    at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
    at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
    at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.net.BindException: Address already in use

because with constructr on the classpath Kamon.start inherits constructr's reference.conf and hence the akka.actor.provider setting which in turn starts akka-remote init for the Kamon actor system even though that is not the desired behaviour. It instead should start the Kamon actor system without akka-remoting enabled and trigger the init only for the "test" actor system.

You could question whether Kamon is right in booting up its own actor system, but I think that would be a bit beside the points since anyone can do this. It is not forbidden, and hence it will happen.

A workaround that I can think of is to override the constructr default by supplying Kamon with a custom config that resets the default back to the akka default akka.actor.provider = "local". This doesn't really feel right to me. Also, in terms of user experience, I think that everybody that uses Kamon in combination with this library will bump into this issue, with the problem not being entirely obvious.

Maybe you have other suggestions for workarounds that are more reasonable?

hseeberger commented 7 years ago

Looking at the Kamon code and like you suggested it's possible to provide an alternative configuration for starting Kamon. As this will be necessary for any application that wants to use the ClusterActorRef provider and therefore remoting, I guess that's not an issue of ConductR, but an implementation/configuration pattern for using Kamon, right? To put another way, even with your patch users would configure the cluster provider in their application.conf and then have the same issue.

rubendg commented 7 years ago

Indeed users would have to configure the cluster provider in their application.conf, but since all reference.confs (which has a different purpose than the application.conf, http://doc.akka.io/docs/akka/current/general/configuration.html#Where_configuration_is_read_from) on the classpath are shared between all actor system instances the fact that one actor system instance (the one constructr is part of) uses the cluster provider leaks to all other actor system instances which don't necessarily need it. In other words, if a library provides a reference.conf on the classpath the settings provided should try their best to be as conflict free as possible with all actor system instances in the final application. That's my two cents, but please let me know if I'm missing out on something here.

hseeberger commented 7 years ago

Thanks for the clarifications. So to make sure I fully understand, you are saying that when you start more than one actor system, you typically provide different configs? E.g. one system is created just via ActorSystem() (using application.conf) and another via ActorSystem("name", config)? If that's the case, you are certainly right.

rubendg commented 7 years ago

This table lists some configurations that library users can encounter:

source (akka.actor.provider =) a b c d
reference.conf ClusterActorRefProvider - ClusterActorRefProvider -
application.conf ClusterActorRefProvider - - ClusterActorRefProvider
Code Evals to
1. ActorSystem("").settings.getString("akka.actor.provider") ClusterActorRefProvider local ClusterActorRefProvider ClusterActorRefProvider
2. ActorSystem("", ConfigFactory.load().withoutPath("akka")).settings.getString("akka.actor.provider") ClusterActorRefProvider local ClusterActorRefProvider local

Currently scenario "a" and "c" apply. If you have both 1 and 2 in your code base (as is the case with the combination Kamon and constructr) you're out of luck. Only one actorsystem will boot up, the other will fail because the remoting port is already bound.

Scenario "b" does not make sense for library users by definition. In the current setup "b" is ruled out by having the setting set in the reference.conf. In scenario "d" this is no longer transitively enforced, but usage of constructr in absence of this setting will result in a runtime error:

[ERROR] [03/01/2017 02:30:01.026] [foobar-akka.actor.default-dispatcher-4] [akka://foobar/user/constructr] ActorSystem [akka://foobar] needs to have a 'ClusterActorRefProvider' enabled in the configuration, currently uses [akka.actor.LocalActorRefProvider]

What this PR allows is scenario "d" such that a code base that contains both 1 and 2 will not run into trouble.

Maybe mentioning in the readme that you'll have to set akka.actor.provider explicitly in your application.conf would help people from running into this runtime error. What do you think?

aleksandr-vin commented 7 years ago

+1 I've run into this issue too. Will be glad to see it in master

azakordonets commented 7 years ago

+1 Would love to see it in master as well

rubendg commented 7 years ago

Due to insufficient permissions on the original remote I've created a new PR with the changes applied #156