hsgubert / cassandra_migrations

Cassandra Migrations is a Cassandra database schema migration library for Rails applications.
MIT License
45 stars 43 forks source link

Extract into multiple, more generic, libraries? #72

Open eprothro opened 8 years ago

eprothro commented 8 years ago

@bsbodden and @hsgubert: I'm going to switch gears to some other things for a bit, but I wanted to get your thoughts on the following for when I get some time to come back to the topic of migrations in a few weeks.

It seems to me that this gem has three component parts:

It also seems to me that every application (not just rails applications) needs the first problem solved.

What would y'all think about creating a separate cassandra-configuration gem that could be responsible for management of cassandra.yml and client/cluster/session management best practices?

Short term, cassandra_migrations could depend on and utilize the cassandra-configuration library. I think this simplifies maintenance of this library and benefits the community who are mostly baking their own solutions to the cassandra-configuration problems (that library would allow them to solve the problem in a lightweight and generic way).

Longer term, I think it would make sense to have something like

Thoughts?

cc: #52

hsgubert commented 8 years ago

This is a really tricky question. There is a trade-off between coherence and simplicity here. When this gem was created, I and the first users just wanted a good way to integrate cassandra to a rails app. But later many people said they wanted to use a lot of the functionality in non-rails apps, so this is a real need people have.

On the other hand, we also want something easy an appealing for people to try out. When you're looking for a gem the last thing you want is a complex mosaic that you don't fully understand. I mean, one thing is rspec (that everyone knows) and the other is a more obscure/niche gem doing that.

I really don't know the answer and would love to hear other opinions. Currently my opinion is that we should break the gem if this brings benefits to the end user. For example, a version for rails and for non-rails apps is a benefit, as it allows people to use the gem in non-rails apps.

On the other hand, separating configuration/session management from migrations would be good for our internal organization, but can't we simply enforce this organization in our code? Thinking as the end user I think I prefer having a single gem and simply ignoring migrations if I don't want to, than having to understand the boundaries of 2 sub-gems.

That being said It would be great to hear other arguments

eprothro commented 8 years ago

I agree with your hesitation. I'm trying to make sure my issue isn't just naming (e.g. If this were named less specifically than cassandra_migrations would I have issue with the multiple responsibilities?).

If the "client" class was designed in a way that the session could be used well outside the context of a migration, I could probably get behind combining configure/session and migrations.

Long term, I feel most strongly about the queries being in another library.

Short term I feel most strongly about allowing ruby users to have a configure/session solution that is designed to be clear and easy to use in their application code.

Thanks for the convo, will keep thinking over the weekend.

On Nov 20, 2015, at 4:11 PM, Henrique Gubert notifications@github.com wrote:

This is a really tricky question. There is a trade-off between coherence and simplicity here. When this gem was created, and I and the first users just wanted a good way to integrate cassandra to a rails app. But later many people said they wanted to use a lot of the functionality in non-rails apps, so this is a real need people have.

On the other hand, we also want something easy an appealing for people to try out. When you're looking for a gem the last thing you want is a complex mosaic that you don't fully understand. I mean, one thing is rspec (that everyone knows) and the other is a more obscure/niche gem doing that.

I really don't know the answer and would love to hear other opinions. Currently my opinion is that we should break the gem if this brings benefits to the end user. For example, a version for rails and for non-rails apps is a benefit, as it allows people to use the gem in non-rails apps.

On the other hand, separating configuration/session management from migrations would be good for our internal organization, but can't we simply enforce this organization in our code? Thinking as the end user I think I prefer having a single gem and simply ignoring migrations if I don't want to, than having to understand the boundaries of 2 sub-gems.

That being said It would be great to hear other arguments

— Reply to this email directly or view it on GitHub.

bsbodden commented 8 years ago

@eprothro @hsgubert I agree with the separation. I was thinking about that. I even have some partial work on a querying + ORMish type of library. I think that we should explore the capabilities of the datastax driver deeper and then see where the holes are and whether the different libraries would have enough to do to be worth writing :-)

hsgubert commented 8 years ago

Right. I also feel very strongly about separating the querying and ORM. I guess the only reason why this was not done yet is because the current querying/ORM capability is so little that it doesn't justify a gem.

@eprothro I also agree with your short term goal, perhaps we could start with that? Extracting a session/configuration manager that works without rails.

eprothro commented 8 years ago

Sounds like the right first step to me.

Next are decisions around naming and interface for a user (unique namespace vs. sharing Cassandra namespace vs. adding functionality to Cassandra module via monkey patch). Will think about these over the Thanksgiving holiday.

A few interface scenarios for which I assume we're all thinking about what the interface should be.

Adding a session attribute, transparently

class SomeQueryClass
  # include Cassandra config/session management methods    < ------ here

  def fetch
    session.execute(some_cql)
    ...
  end
end

Adding a session attribute, with configuration

class SomeDataMapperResourceClass
  # include Cassandra config/session management methods    < ------ here

  # specify which keyspace the session for this class should be connected to < ------ here

  def self.find_by_username(username, opts={})
    session.execute(some_cql)
    ...
  end
end

Getting the cluster config

# some_task.rb
class SomeTask

  def cluster_config
    # get the current cluster connection options     < ------ here
  end
end

Managing the cluster/sessions

# some_initializer_file.rb
SomeForkedProcess.after_fork do
  # Establish unique connection for this process fork      < ------ here
end
sstgithub commented 8 years ago

@eprothro I think the separations make a lot of sense, but can you clear up what the scope of each gem would be? Would creating the keyspace, updating keyspace settings like RF, updating table settings all be part of cassandra-configuration (or would you have to do another migration using cassandra-migrations to update table settings)? Also, I assume since consistency level can be set on a cluster or on individual reads/writes that would be part of both cassandra-queries and cassandra-configuration?

eprothro commented 8 years ago

@sstgithub Great questions.

I think the configuration gem's responsibility would be managing cluster and session configuration for multiple environments. These cluster and session objects would be instantiated cassandra-driver classes.

In that regard, I would expect to be able to tune default consistency of requests to the cluster/dc connection with the configuration gem by itself.

I don't think a configuration gem knows anything about a query, directly. So, tuning consistency for a query would be the client's responsibility, or a queries gem's responsibility if the client chose to use that.

The table settings one is an interesting question. I would expect cassandra.yml to describe connection settings. In that regard, something like replication_factor doesn't exactly belong. However, changing a replication factor probably doesn't makes sense as a traditional database (read: schema) migration, since it is something that is not shared across environments (the same way a table name, or column type is).

Currently, I only think replication_factor and class are being used for the cassandra:create task. For now, I would, personally, be ok with removing keyspace options from the cassandra.yml and having this create task simply use the defaults (e.g. simple, 1) and expect that users in production are managing non-schema properties competently.

My question would be "how are people currently managing non-schema database configuration mutations?" (e.g. keyspace properties like RF and Durable Writes, table options like compaction and cacheing).

I assume the answer is "manually" (e.g. tweaking settings and, prayerfully, documenting changes somewhere). If that's the case, I'd love to discuss and come up with a better answer eventually, but I don't know if that is the scope of this initial change.

Thoughts?

sstgithub commented 8 years ago

@eprothro Apologies for the late reply.

I would say that replication_factor should remain in cassandra.yml as I see cassandra.yml as handling both connection and all other initial settings for the keyspace. Also, I think it might make more sense to use well-separated modules in one gem instead of multiple gems because there will be quite a bit of overlap with the DSL (for instance, defining default consistency level initially in the configuration module and then being able to use a different consistency level with each query in the query module)

As for the non-schema db config mutations, I don't know how others using this gem manage things but we do currently manage most of those manually. It would be great if they were handled by this gem but I agree thats outside the scope of this change.

eprothro commented 8 years ago

@sstgithub I've been playing with this, and I agree that a single repo with well organized and loosely coupled modules is best. I think my current opinion is that this repo could still contain multiple gems, for those that want to use in isolation (similar to rails with activesupport activemodel, etc).

Regarding db config mutations, we handle these in a separate repo for our infrastructure configuration. I think this is appropriate for now, and I think just mentioning that this is a need that exists and is separate from schema mutations supported by the gem is probably responsible enough.

eprothro commented 7 years ago

@sstgithub @hsgubert @bsbodden Wanted to update y'all on this.

Over the last year we've bootstrapped our way into production with Cassandra (went live in December). Along the way it turns out I've crated the libraries that we've discussed in this thread.

https://github.com/eprothro/cassie https://github.com/eprothro/cassie-rails

In no way am I trying to be "that guy" and say "hey, let's all just use these!".

For now, I just wanted to let y'all know about them, mention that they are in line with a lot of what we've discussed here, and working well for us. I'm 100% open to any comments, questions, nits, or discussions.