outworkers / phantom

Schema safe, type-safe, reactive Scala driver for Cassandra/Datastax Enterprise
http://outworkers.github.io/phantom/
Apache License 2.0
1.05k stars 186 forks source link

Documentation / Examples? Multiple Keyspaces (dynamic) using same session/cluster connection #462

Open thunderstumpges opened 8 years ago

thunderstumpges commented 8 years ago

I'm not certain this is a bug in the code, but I feel like I've been mislead with the documentation and the behavior doesn't match what I thought I'd expect. I am using v1.22.0

First off, my use-case I think is a bit non-standard so that could be part of the problem. I have a consistent schema (Tables/Databases) that exist in multiple keyspaces (of which I don't know ahead of time, and aren't static, but are finite and in the tens max).

I have tried to manipulate examples I've seen which tend to create a static (object) connector for a single keyspace, and instead provide a connector function that will dynamically return a keyspace/session/connector to the same cluster based on a keyspace name.

So, some examples of wordings that make me think this should be easy/possible:

From A series on phantom: Part 1 Getting Started With Phantom:

After 1.10.0, Phantom unifies database access through a single global session that is now capable of simultaneously using different keyspaces.

From KeyspaceBuilder.scala:

When using multiple keySpaces in the same Cassandra cluster, it is recommended to create all KeySpace instances from the same builder instance.

and from DefaultSessionProvider.scala

This implementation caches Session instances per keySpace.

Now, what I have tried, based on examples, and the comments and documentation above, is I have a single instance of KeyspaceBuilder, and I call the .keySpace(keyspace_name) function on it whenever I need a connector for that keyspace (rather than a static val connector):

/**
  * Connector to get to Cassandra cluster. 
  */
object CassandraConnector {
  val config = ConfigFactory.load()
  val hosts = config.getStringList("cassandra.host")
  val inets = hosts.map(InetAddress.getByName)

  val keyspaceBuilder = ContactPoints(hosts)
  val dynamicKsConnector = (ks:String) => keyspaceBuilder.keySpace(ks)
}

Now, I can see from tests that this doesn't work, and from what I can tell, it's creating a new cluster, session, etc for every instance of KeySpaceBuilder.keySpace() => KeySpaceDef => DefaultSessionProvider => Session

Am I really expected to create my own Dictionary/Map of keyspace names to instances of KeySpaceBuilder/KeySpaceDef and return the correct one based on the keyspace name? If so, it would be nice if the documentation was clearer on this point. I thought for sure that was handled by phantom.

Thanks in advance!

alexflav23 commented 8 years ago

Hi @thunderstumpges,

The keyspace string you provide as an argument to the builder is used to create the original keyspace if it doesn't exist. Assuming this is not something you require, all you need to do is to provide a different implicit keyspace: KeySpace in multiple connector variants.

The builder no longer caches multiple sessions as it doesn't really need to anymore since phantom prefixes every single query with the necessary keyspace.

I will think about a clean way to achieve this and possibly providing an internal solution, as you point out it's not a standard use case so we don't normally consider it.

Regards.

thunderstumpges commented 8 years ago

So is my first referenced documentation above not accurate anymore? As I see it, currently there is an entirely new session + cluster + connection made for every KeySpaceDef instance. This is exemplified by my code which ends up creating new sessions and cluster connections over and over and over....

If the KeySpaceBuilder object really only uses the name to create the keyspace (which from what I can see really happens in DefaultSessionProvider, not in one of the KeySpace classes) and the KeySpace can use a shared session/cluster globally, then why can't the implementation of DefaultSessionProvider connect to the cluster once and encapsulate the "singleton" nature of the session?

If not DefaultSessionProvider then maybe a different SessionProvider? It seems like something documented earlier was lost, and it's making my use of the library unnecessarily complicated...

thunderstumpges commented 8 years ago

Well the more I look into the phantom code, the more I am realizing that I need my own SessionProvider because I need different KeySpace initialization code (can't use SimpleStrategy, and can't use RepFactor 1) so I can't use DefaultSessionProvider. Because KeySpaceDef is hardcoded to use DefaultSessionProvider, it looks like I can't use it either, and that bubbles up to KeySpaceBuilder as well...

For now I guess I'm looking at my own implementations of each of these classes, plus some logic that supplies a single Session inside my SessionProvider. Does that seem correct to you?

thanks again, Thunder

thunderstumpges commented 8 years ago

Ugh. Well I've now spent too many hours trying to get this working... ran into another wall where DatabaseImpl needs the com.websudos version of KeySpaceDef, not just a Connector. SO I seem to be stuck with basically the entire stack implementation.

BTW, while going through all of this, I noticed the following quote in the java-driver documentation:

You might be tempted to open a separate session for each keyspace used in your application; however, note that connection pools are created at the session level, so each new session will consume additional system resources: // Warning: creating two sessions doubles the number of TCP connections opened by the driver

However it appears that even in the best / primary use case scenario (static object/val members for each connector / KeySpaceDef as is in all of the examples), the phantom library creates a new session, cluster per keyspace, correct? It seems this is not ideal. And if I will be needing 10s of keyspaces, that means even at best, 10s of sessions and connections.

OK, I'm done for tonight I promise!

P.S> if you are interested in my late-night stab at a SmartSessionProvider implementation, see below (it really isn't much more code than DefaultSessionProvider) What do you think? It caches a single global session by clusterName (from builder) and performs keyspace init only once by keyspace name.

/**
  * This implementation caches `Cluster` instances that have the same name. A clusterName in ClusterBuilder is required, or a
  * new one will be created every time (probably not what you want)
  */
class SmartSessionProvider(val space: KeySpace, builder: ClusterBuilder) extends SessionProvider {
  val session = SmartSessionProvider.session(space.name, builder)
  val cluster: Cluster = session.getCluster
}

object SmartSessionProvider {
  private val clusterSessions = new mutable.HashMap[String, Session] with mutable.SynchronizedMap[String, Session]
  private val keyspaceSessions = new mutable.HashMap[String, Session] with mutable.SynchronizedMap[String, Session]
  private val CLUSTER_ID = new AtomicInteger(0)
  private def session(keySpace: String, builder: ClusterBuilder) : Session = {
    val jBuilder = builder(Cluster.builder)
      .withoutJMXReporting().withoutMetrics() // TODO: do we want this?

    // make sure we have a name
    if (jBuilder.getClusterName == null) jBuilder.withClusterName("cluster" + CLUSTER_ID.incrementAndGet)

    // get the existing global session for this cluster or else construct a new one
    val session = clusterSessions.getOrElseUpdate(jBuilder.getClusterName, {
      jBuilder.build().connect
    })

    // only auto-create the keyspace once, return the single session for the cluster
    keyspaceSessions.getOrElseUpdate(keySpace, {
      initKeySpace(session, keySpace)
    })

  }
  /**
    * Initializes the keySpace with the given name on
    * the specified Session.
    */
  private[this] def initKeySpace(session: Session, keySpace: String): Session = blocking {
    blocking {
      session.execute(s"CREATE KEYSPACE IF NOT EXISTS $keySpace WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor' : 3};")
    }
    session
  }
}
alexflav23 commented 8 years ago

Hi @thunderstumpges,

We are working on an internal update to this, there may be a nicer way to add this at framework level. The problem you will be dealing with is mostly the implicit scope that will be provided by the keyspace, and that's not super easy to handle given it's used literally everywhere.

It should still be possible to fix this and give better support, even if this is as you say a fairly niche situation.

Regards.

thunderstumpges commented 8 years ago

Thanks @alexflav23. While you mull this over internally, please also consider formalizing the SessionProvider trait so that someone could provide their own implementation.

Also, it seems like at some point along the way Connector and KeySpaceDef got muddled together. I see several places in the code that have a variable or parameter called 'connector' and they take a concrete type of KeySpaceDef. Might clean up the abstraction layer if these concepts were formalized a little more.

And finally, if none of my quick fixes seem to strike a chord with you guys, how long do you think this internal update might take? We are in a project that needs to continue to progress. I need to weigh my options and this is going to be blocking me in short time.

Thanks!

alexflav23 commented 8 years ago

HI @thunderstumpges,

In all honesty, the multiple keyspace thing looks very very dangerous from a distance. I don't know the specifics of what you are doing internally, but this is the first I hear of such an approach undertaken.

I can understand you have different keyspaces where you store the same data, but I don't really know why you couldn't simply have a composite key doing the exact same thing and achieving a better distribution of your data over token ranges.

Assuming uniform distribution, you'd probably fair a lot better. I doubt Cassandra optimizes heavily for a lot of keyspaces and at the same time if the settings with which they are created are identical, it also makes very little sense to have the keyspaces co-existing altogether.

So my gut feeling is that there is a better and simpler way to do this.

Regards.

eugenemiretsky commented 6 years ago

@alexflav23 We have the same use case too. Has there been any updates on this. Some examples of why we want to create several key keyspaces 1) Multi-tenancy 2) Events with different ttl go to keyspaces with different settings