jetstack / navigator

Managed Database-as-a-Service (DBaaS) on Kubernetes
Apache License 2.0
271 stars 31 forks source link

Persistent volumes for Cassandra data #225

Closed wallrj closed 6 years ago

wallrj commented 6 years ago

Fixes: #216

Release note:

NONE
wallrj commented 6 years ago

@munnerz This is ready for an initial review.

munnerz commented 6 years ago

This all looks good to me and makes sense (given you've not done the PVC bits yet). Can you tidy up commits though?

You aren't actually persisting any data as part of this pull request, so I'm not sure why we'd expect any data to be persisted (hence the e2e tests failing).

What kind of tests are you looking to make for PVCs? All that is required is adding a PersistentVolumeClaimTemplate field to the StatefulSet that gets created. A simple unit test to ensure this template is set, and the name matches the name of the volume mount should suffice. During e2e tests it isn't possible to use PVCs properly as we are using minikube, which doesn't provide a dynamic volume provisioner.

wallrj commented 6 years ago

The emptydir should survive as long as the pod, right?

So by killing the Cassandra process I expect it to restart with the data that had been inserted.

And the tests pass on my local cluster and on the 1.9 tests.

So I think there's something else going on here.

On Jan 26, 2018 3:00 PM, "James Munnelly" notifications@github.com wrote:

This all looks good to me and makes sense (given you've not done the PVC bits yet). Can you tidy up commits though?

You aren't actually persisting any data as part of this pull request, so I'm not sure why we'd expect any data to be persisted (hence the e2e tests failing).

What kind of tests are you looking to make for PVCs? All that is required is adding a PersistentVolumeClaimTemplate field to the StatefulSet that gets created. A simple unit test to ensure this template is set, and the name matches the name of the volume mount should suffice. During e2e tests it isn't possible to use PVCs properly as we are using minikube, which doesn't provide a dynamic volume provisioner.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jetstack/navigator/pull/225#issuecomment-360807346, or mute the thread https://github.com/notifications/unsubscribe-auth/AA7wFVd1fuatuMEKyZQIJFbZ45wV1q_oks5tOeiRgaJpZM4RtC9Q .

munnerz commented 6 years ago

Looking good - one thing to note, I'm not sure if we need to bother testing whether the PVCs are bound to the PVs you've manually created in e2e tests. Doing so is not a test of Navigator, but a test of the StatefulSet controller (as well as the PVC<->PV controller). Can you see any value beyond this that it brings? I'd be happy to just have a unit test that ensures the StatefulSet has the expected PVC template specified, as it's then up to Kubernetes itself to ensure that is bound and mounted.

wallrj commented 6 years ago

So the failed tests show data being lost.

W0131 14:10:38.602] ++ kubectl run cql-connect-5158 --namespace=test-cassandra-1517407346-5126 --command=true --image=cassandra:latest --restart=Never --rm --stdin=true --attach=true --quiet -- /usr/bin/cqlsh cass-cassandra-1517407346-5126-cassandra-cql 9042 --debug '--execute=SELECT * FROM space1.testtable1'
W0131 14:10:40.808] + actual='Using CQL driver: <module '\''cassandra'\'' from '\''/usr/share/cassandra/lib/cassandra-driver-internal-only-3.10.zip/cassandra-driver-3.10/cassandra/__init__.py'\''>
W0131 14:10:40.809] Using connect timeout: 5 seconds
W0131 14:10:40.809] Using '\''utf-8'\'' encoding
W0131 14:10:40.809] Using ssl: False
W0131 14:10:40.809] 
W0131 14:10:40.809]  key | value
W0131 14:10:40.809] -----+-------
W0131 14:10:40.809] 
W0131 14:10:40.809] (0 rows)'
W0131 14:10:40.809] + grep --quiet testvalue1
W0131 14:10:40.813] + local exit_code=1

The startup log of the Cassandra process in the replacement Pod shows the keyspace and table being restored and the commit log being replayed.

INFO  14:08:44 Initializing space1.testtable1
INFO  14:08:44 Not submitting build tasks for views in keyspace space1 as storage service is not initialized
INFO  14:08:44 Initializing system_auth.resource_role_permissons_index
INFO  14:08:44 Initializing system_auth.role_members
INFO  14:08:44 Initializing system_auth.role_permissions
INFO  14:08:44 Initializing system_auth.roles
INFO  14:08:44 Not submitting build tasks for views in keyspace system_auth as storage service is not initialized
INFO  14:08:44 Initializing system_traces.events
INFO  14:08:44 Initializing system_traces.sessions
INFO  14:08:44 Not submitting build tasks for views in keyspace system_traces as storage service is not initialized
INFO  14:08:44 Completed loading (1 ms; 15 keys) KeyCache cache
INFO  14:08:44 Replaying /cassandra_data/commitlog/CommitLog-6-1517407569595.log, /cassandra_data/commitlog/CommitLog-6-1517407569596.log
INFO  14:08:46 Log replay complete, 26 replayed mutations

But the previous Cassandra process obviously wasn't shutdown gracefully and didn't flush its commit log to disk.

The reason, I think, is that the SIGTERM is being propagated by the Pilot to an intermediate bash script, which doesn't then propagate it to the Cassandra process.

richard@richardw-pet1:~/go/src/github.com/jetstack/navigator$ kubectl --namespace test-cassandra-1517406857-1172 exec cass-cassandra-1517406857-1172-cassandra-ringnodes-0 -- ps faux                                            USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root       151  0.0  0.0  34420  2824 ?        Rs   15:00   0:00 ps faux
root         1  7.0  0.1  31648 19896 ?        Ssl  15:00   0:00 /shared/pilot --v=4 --logtostderr --pilot-name=cass-cassandra-1517406857-1172-cassandra-ringnodes-0 --pilot-namespace=test-cassandra-1517406857-1172 --leader-election-config-map=cassandra-cassandra-1517406857-1172-cassandra-leaderelection
root        16  0.0  0.0  18068  2932 ?        S    15:00   0:00 /bin/bash /run.sh
root        32  0.0  0.0  46988  2844 ?        S    15:00   0:00  \_ su cassandra -c /usr/local/apache-cassandra-3.9/bin/cassandra -f
cassand+    33  146  4.3 2195744 669116 ?      Ssl  15:00   0:05      \_ /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Xloggc:/usr/local/apache-cassandra-3.9/logs/gc.log -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42
-XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003 -XX:+AlwaysPreTouch -XX:-UseBiasedLocking -XX:+UseTLAB -XX:+ResizeTLAB -XX:+PerfDisableSharedMem -Djava.net.preferIPv4Stack=true -XX:+UseG1GC -XX:G1RSetUpdatingPauseTimePercent=5 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintPromotionFailure -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M -Dcassandra.migration_task_wait_in_seconds=1 -Dcassandra.ring_delay_ms=30000 -Xms512M -Xmx512M -XX:CompileCommandFile=/etc/cassandra/hotspot_compiler -javaagent:/usr/local/apache-cassandra-3.9/lib/jamm-0.3.0.jar -Dcassandra.jmx.local.port=7199 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.password.file=/etc/cassandra/jmxremote.password -Djava.library.path=/usr/local/apache-cassandra-3.9/lib/sigar-bin -Djava.rmi.server.hostname=10.192.2.98 -Dcassandra.libjemalloc=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1 -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/usr/local/apache-cassandra-3.9/logs -Dcassandra.storagedir=/usr/local/apache-cassandra-3.9/data -Dcassandra-foreground=yes -cp /etc/cassandra:/usr/local/apache-cassandra-3.9/build/classes/main:/usr/local/apache-cassandra-3.9/build/classes/thrift:/usr/local/apache-cassandra-3.9/lib/HdrHistogram-2.1.9.jar:/usr/local/apache-cassandra-3.9/lib/ST4-4.0.8.jar:/usr/local/apache-cassandra-3.9/lib/airline-0.6.jar:/usr/local/apache-cassandra-3.9/lib/antlr-runtime-3.5.2.jar:/usr/local/apache-cassandra-3.9/lib/apache-cassandra-3.9.jar:/usr/local/apache-cassandra-3.9/lib/apache-cassandra-clientutil-3.9.jar:/usr/local/apache-cassandra-3.9/lib/apache-cassandra-thrift-3.9.jar:/usr/local/apache-cassandra-3.9/lib/asm-5.0.4.jar:/usr/local/apache-cassandra-3.9/lib/caffeine-2.2.6.jar:/usr/local/apache-cassandra-3.9/lib/cassandra-driver-core-3.0.1-shaded.jar:/usr/local/apache-cassandra-3.9/lib/commons-cli-1.1.jar:/usr/local/apache-cassandra-3.9/lib/commons-codec-1.2.jar:/usr/local/apache-cassandra-3.9/lib/commons-lang3-3.1.jar:/usr/local/apache-cassandra-3.9/lib/commons-math3-3.2.jar:/usr/local/apache-cassandra-3.9/lib/compress-lzf-0.8.4.jar:/usr/local/apache-cassandra-3.9/lib/concurrent-trees-2.4.0.jar:/usr/local/apache-cassandra-3.9/lib/concurrentlinkedhashmap-lru-1.4.jar:/usr/local/apache-cassandra-3.9/lib/disruptor-3.0.1.jar:/usr/local/apache-cassandra-3.9/lib/ecj-4.4.2.jar:/usr/local/apache-cassandra-3.9/lib/guava-18.0.jar:/usr/local/apache-cassandra-3.9/lib/high-scale-lib-1.0.6.jar:/usr/local/apache-cassandra-3.9/lib/hppc-0.5.4.jar:/usr/local/apache-cassandra-3.9/lib/jackson-core-asl-1.9.2.jar:/usr/local/apache-cassandra-3.9/lib/jackson-mapper-asl-1.9.2.jar:/usr/local/apache-cassandra-3.9/lib/jamm-0.3.0.jar:/usr/local/apache-cassandra-3.9/lib/javax.inject.jar:/usr/local/apache-cassandra-3.9/lib/jbcrypt-0.3m.jar:/usr/local/apache-cassandra-3.9/lib/jcl-over-slf4j-1.7.7.jar:/usr/local/apache-cassandra-3.9/lib/jflex-1.6.0.jar:/usr/local/apache-cassandra-3.9/lib/jna-4.0.0.jar:/usr/local/apache-cassandra-3.9/lib/joda-time-2.4.jar:/usr/local/apache-cassandra-3.9/lib/json-simple-1.1.jar:/usr/local/apache-cassandra-3.9/lib/libthrift-0.9.2.jar:/usr/local/apache-cassandra-3.9/lib/log4j-over-slf4j-1.7.7.jar:/usr/local/apache-cassandra-3.9/lib/logback-classic-1.1.3.jar:/usr/local/apache-cassandra-3.9/lib/logback-core-1.1.3.jar:/usr/local/apache-cassandra-3.9/lib/lz4-1.3.0.jar:/usr/local/apache-cassandra-3.9/lib/metrics-core-3.1.0.jar:/usr/local/apache-cassandra-3.9/lib/metrics-jvm-3.1.0.jar:/usr/local/apache-cassandra-3.9/lib/metrics-logback-3.1.0.jar:/usr/local/apache-cassandra-3.9/lib/netty-all-4.0.39.Final.jar:/usr/local/apache-cassandra-3.9/lib/ohc-core-0.4.3.jar:/usr/local/apache-cassandra-3.9/lib/ohc-core-j8-0.4.3.jar:/usr/local/apache-cassandra-3.9/lib/primitive-1.0.jar:/usr/local/apache-cassandra-3.9/lib/reporter-config-base-3.0.0.jar:/usr/local/apache-cassandra-3.9/lib/reporter-config3-3.0.0.jar:/usr/local/apache-cassandra-3.9/lib/sigar-1.6.4.jar:/usr/local/apache-cassandra-3.9/lib/slf4j-api-1.7.7.jar:/usr/local/apache-cassandra-3.9/lib/snakeyaml-1.11.jar:/usr/local/apache-cassandra-3.9/lib/snappy-java-1.1.1.7.jar:/usr/local/apache-cassandra-3.9/lib/snowball-stemmer-1.3.0.581.1.jar:/usr/local/apache-cassandra-3.9/lib/stream-2.5.2.jar:/usr/local/apache-cassandra-3.9/lib/thrift-server-0.3.7.jar:/usr/local/apache-cassandra-3.9/lib/jsr223/*/*.jar org.apache.cassandra.service.CassandraDaemon

One way to fix this is to use an image whose entrypoint execs the Cassandra process, so that Cassandra becomes the immediate child of the Pilot.

The Docker maintained image does this #222

wallrj commented 6 years ago

@munnerz Ready for another review now that I've rebased on top of the official Docker image and squashed all the commits.

Some answers to your previous comments:

I'm not sure if we need to bother testing whether the PVCs are bound to the PVs you've manually created in e2e tests. Doing so is not a test of Navigator, but a test of the StatefulSet controller (as well as the PVC<->PV controller).

Only reason I'm creating the PVs is because we don't have dynamic volume provisioning on our E2E test cluster yet. See https://github.com/jetstack/test-infra/issues/134

Can you see any value beyond this that it brings?

Yeah, I think it's valuable to demonstrate that Navigator runs Cassandra in such a way that data isn't lost, even when all cluster nodes are stopped.

As you can see in the comment above, the E2E tests revealed that the Cassandra database wasn't being shut down gracefully and therefore not flushing its commit log to disk...leading to data loss...so I think that illustrates the value.

I'd be happy to just have a unit test that ensures the StatefulSet has the expected PVC template specified, as it's then up to Kubernetes itself to ensure that is bound and mounted.

I didn't add unit tests yet, but I can also do that here, if you like, or in a followup branch if that's ok with you. So far I haven't bother writing unit tests verifying the exact resources that Navigator controllers create. Instead just verifying that e.g. a statefulset is created and that only owned statefulsets are modified.

I need to consider the case where a user tries to change the size or the storage class. For now I think we should look at ways to disallow that. I

wallrj commented 6 years ago

Thanks for the reviews @munnerz

I've addressed your comments and pushed again.

Let's see if the E2E tests pass.

munnerz commented 6 years ago

/lgtm /approve

jetstack-ci-bot commented 6 years ago

/lgtm cancel //PR changed after LGTM, removing LGTM. @munnerz @wallrj

munnerz commented 6 years ago

/lgtm /approve

jetstack-bot commented 6 years ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: munnerz

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files: - ~~[OWNERS](https://github.com/jetstack/navigator/blob/master/OWNERS)~~ [munnerz] You can indicate your approval by writing `/approve` in a comment You can cancel your approval by writing `/approve cancel` in a comment
jetstack-bot commented 6 years ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: munnerz

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files: - ~~[OWNERS](https://github.com/jetstack/navigator/blob/master/OWNERS)~~ [munnerz] You can indicate your approval by writing `/approve` in a comment You can cancel your approval by writing `/approve cancel` in a comment
munnerz commented 6 years ago

/retest

jetstack-ci-bot commented 6 years ago

/test all [submit-queue is verifying that this PR is safe to merge]

jetstack-ci-bot commented 6 years ago

Automatic merge from submit-queue.