production-relevant cassandra schema

codefromthecrypt commented 8 years ago

from @mikewrighton

hi, anyone know if there’s an easy way to modify the cassandra schema? I’d like to change the replication factor which is fixed at 1 in cassandra-schema-cql3.

It's currently tribal knowledge that the built-in schema isn't ideal for all production environments. We take some steps that make it easier for tests to pass, etc.

It would be nice for users and also for benchmarkers to use more realistic schema settings.

@openzipkin/cassandra do you know of a list of things about the schema that would certainly need to change in a multi-node cassandra cluster in production? If you can enumerate them, I can help document and maybe we can brainstorm a "dev mode" flag or some such that makes test-level options not the default.

prat0318 commented 8 years ago

We added off-heap memtable allocation of 20G which reduced the # of flushes and resulted in lesser compaction.

mikewrighton commented 8 years ago

I guess I was thinking that since there is some useful code around the schema loading, like in zipkin.storage.cassandra.Schema, it might be good if it were somehow extensible e.g. if you could provide your own schema or 'upgrade schema' file, and/or modify some of the parameters in the default schema like replication factor.

codefromthecrypt commented 8 years ago

The only way to do this is somewhat basic.. put said file in front of the classpath!

codefromthecrypt commented 8 years ago

in the case of docker you'd overwrite the file at /zipkin/cassandra-schema-cql3.txt

doing arbitrary upgrades could be dodgy. there's careful logic about the upgrade, and it checks for very certain things because CQL can't do everything. A log message might be misleading if we used this check, but did something else.

ex. "/cassandra-schema-cql3-upgrade-1.txt" has this check

static boolean hasUpgrade1_defaultTtl(KeyspaceMetadata keyspaceMetadata) { // TODO: we need some approach to forward-check compatibility as well. // backward: this code knows the current schema is too old. // forward: this code knows the current schema is too new. return keyspaceMetadata.getTable("traces").getOptions().getDefaultTimeToLive()

0; }

We have tests to show the effects of this work etc, but arbitrary things aren't something we could promise and therefore unlikely to be able to support.

I'd recommend only replacing the semantic contents of the existing schema files for this reason. Also, there's a lot of folks who use cassandra.. maybe there are other tools available to keep schema up to date which don't require zipkin's ENSURE_SCHEMA feature?

michaelsembwever commented 8 years ago

Increasing RF to 3+ is important in production.

But I don't know what's best way to do that without breaking dev environments. Currently there is the warning printed, ref https://github.com/openzipkin/zipkin/blob/master/zipkin-storage/cassandra/src/main/java/zipkin/storage/cassandra/Schema.java#L43

michaelsembwever commented 8 years ago

Other important things to do to a problem environment are

disable assertions (remove "-ea" from cassandra-env.sh)
run Java8 and G1GC
have all Cassandra and Zipkin servers sync regularly against an internal ntp server
enable cross_node_timeout in cassandra.yaml
disable swap
unlimited ulimits

jorgheymans commented 4 years ago

@adriancole is this still relevant ? If yes i can search around the issues and put up some 'hints' in the documentation like above, as well as a warning about the provided Cassandra schema that sites should really not rely on the 'demo' schema configuration we provide. We should not become Cassandra tweaking experts though, merely hint that sites are responsible for squeezing the most out of their storage, and we'll just tell them what is important in terms of zipkin storage and indexing needs.

If not and all this is hopelessly outdated, feel free to close :-)

codefromthecrypt commented 4 years ago

we could probably handle replication factor as an ENV variable as we do in elasticsearch, and leave it at that for now.

jorgheymans commented 4 years ago

allright i can give that a go after you landed the DataStax Driver 4.0 Mothership https://github.com/openzipkin/zipkin/pull/3246

openzipkin / zipkin

production-relevant cassandra schema #1194