jaegertracing / jaeger-openshift

Support for deploying Jaeger into OpenShift
https://jaegertracing.io/
Apache License 2.0
57 stars 37 forks source link

Deleting or scaling down Cassandra StatefulSet #30

Open pavolloffay opened 7 years ago

pavolloffay commented 7 years ago

I get Cannot achieve consistency level LOCAL_ONE after I have manually deleted C pod. Sometimes it recovered, sometimes it returned this error. C logs show this:

CompilerOracle: dontinline org/apache/cassandra/db/Columns$Serializer.deserializeLargeSubset (Lorg/apache/cassandra/io/util/DataInputPlus;Lorg/apache/cassandra/db/Columns;I)Lorg/apache/cassandra/db/Columns;
CompilerOracle: dontinline org/apache/cassandra/db/Columns$Serializer.serializeLargeSubset (Ljava/util/Collection;ILorg/apache/cassandra/db/Columns;ILorg/apache/cassandra/io/util/DataOutputPlus;)V
CompilerOracle: dontinline org/apache/cassandra/db/Columns$Serializer.serializeLargeSubsetSize (Ljava/util/Collection;ILorg/apache/cassandra/db/Columns;I)I
CompilerOracle: dontinline org/apache/cassandra/db/commitlog/AbstractCommitLogSegmentManager.advanceAllocatingFrom (Lorg/apache/cassandra/db/commitlog/CommitLogSegment;)V
CompilerOracle: dontinline org/apache/cassandra/db/transform/BaseIterator.tryGetMoreContents ()Z
CompilerOracle: dontinline org/apache/cassandra/db/transform/StoppingTransformation.stop ()V
CompilerOracle: dontinline org/apache/cassandra/db/transform/StoppingTransformation.stopInPartition ()V
CompilerOracle: dontinline org/apache/cassandra/io/util/BufferedDataOutputStreamPlus.doFlush (I)V
CompilerOracle: dontinline org/apache/cassandra/io/util/BufferedDataOutputStreamPlus.writeExcessSlow ()V
CompilerOracle: dontinline org/apache/cassandra/io/util/BufferedDataOutputStreamPlus.writeSlow (JI)V
CompilerOracle: dontinline org/apache/cassandra/io/util/RebufferingInputStream.readPrimitiveSlowly (I)J
CompilerOracle: inline org/apache/cassandra/db/rows/UnfilteredSerializer.serializeRowBody (Lorg/apache/cassandra/db/rows/Row;ILorg/apache/cassandra/db/SerializationHeader;Lorg/apache/cassandra/io/util/DataOutputPlus;)V
CompilerOracle: inline org/apache/cassandra/io/util/Memory.checkBounds (JJ)V
CompilerOracle: inline org/apache/cassandra/io/util/SafeMemory.checkBounds (JJ)V
CompilerOracle: inline org/apache/cassandra/utils/AsymmetricOrdering.selectBoundary (Lorg/apache/cassandra/utils/AsymmetricOrdering/Op;II)I
CompilerOracle: inline org/apache/cassandra/utils/AsymmetricOrdering.strictnessOfLessThan (Lorg/apache/cassandra/utils/AsymmetricOrdering/Op;)I
CompilerOracle: inline org/apache/cassandra/utils/BloomFilter.indexes (Lorg/apache/cassandra/utils/IFilter/FilterKey;)[J
CompilerOracle: inline org/apache/cassandra/utils/BloomFilter.setIndexes (JJIJ[J)V
CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compare (Ljava/nio/ByteBuffer;[B)I
CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compare ([BLjava/nio/ByteBuffer;)I
CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compareUnsigned (Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
CompilerOracle: inline org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo (Ljava/lang/Object;JILjava/lang/Object;JI)I
CompilerOracle: inline org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo (Ljava/lang/Object;JILjava/nio/ByteBuffer;)I
CompilerOracle: inline org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo (Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
CompilerOracle: inline org/apache/cassandra/utils/vint/VIntCoding.encodeVInt (JI)[B
INFO  [main] 2017-07-27 09:22:55,869 YamlConfigurationLoader.java:89 - Configuration location: file:/etc/cassandra/cassandra.yaml
INFO  [main] 2017-07-27 09:22:56,212 Config.java:481 - Node configuration:[allocate_tokens_for_keyspace=null; authenticator=AllowAllAuthenticator; authorizer=AllowAllAuthorizer; auto_bootstrap=true; auto_snapshot=true; back_pressure_enabled=false; back_pressure_strategy=org.apache.cassandra.net.RateBasedBackPressure{high_ratio=0.9, factor=5, flow=FAST}; batch_size_fail_threshold_in_kb=50; batch_size_warn_threshold_in_kb=5; batchlog_replay_throttle_in_kb=1024; broadcast_address=172.17.0.8; broadcast_rpc_address=172.17.0.8; buffer_pool_use_heap_if_exhausted=true; cas_contention_timeout_in_ms=1000; cdc_enabled=false; cdc_free_space_check_interval_ms=250; cdc_raw_directory=null; cdc_total_space_in_mb=0; client_encryption_options=<REDACTED>; cluster_name=jaeger; column_index_cache_size_in_kb=2; column_index_size_in_kb=64; commit_failure_policy=stop; commitlog_compression=null; commitlog_directory=/var/lib/cassandra/commitlog; commitlog_max_compression_buffers_in_pool=3; commitlog_periodic_queue_size=-1; commitlog_segment_size_in_mb=32; commitlog_sync=periodic; commitlog_sync_batch_window_in_ms=NaN; commitlog_sync_period_in_ms=10000; commitlog_total_space_in_mb=null; compaction_large_partition_warning_threshold_mb=100; compaction_throughput_mb_per_sec=16; concurrent_compactors=null; concurrent_counter_writes=32; concurrent_materialized_view_writes=32; concurrent_reads=32; concurrent_replicates=null; concurrent_writes=32; counter_cache_keys_to_save=2147483647; counter_cache_save_period=7200; counter_cache_size_in_mb=null; counter_write_request_timeout_in_ms=5000; credentials_cache_max_entries=1000; credentials_update_interval_in_ms=-1; credentials_validity_in_ms=2000; cross_node_timeout=false; data_file_directories=[Ljava.lang.String;@37e547da; disk_access_mode=auto; disk_failure_policy=stop; disk_optimization_estimate_percentile=0.95; disk_optimization_page_cross_chance=0.1; disk_optimization_strategy=ssd; dynamic_snitch=true; dynamic_snitch_badness_threshold=0.1; dynamic_snitch_reset_interval_in_ms=600000; dynamic_snitch_update_interval_in_ms=100; enable_scripted_user_defined_functions=false; enable_user_defined_functions=false; enable_user_defined_functions_threads=true; encryption_options=null; endpoint_snitch=GossipingPropertyFileSnitch; file_cache_size_in_mb=null; gc_log_threshold_in_ms=200; gc_warn_threshold_in_ms=1000; hinted_handoff_disabled_datacenters=[]; hinted_handoff_enabled=true; hinted_handoff_throttle_in_kb=1024; hints_compression=null; hints_directory=null; hints_flush_period_in_ms=10000; incremental_backups=false; index_interval=null; index_summary_capacity_in_mb=null; index_summary_resize_interval_in_minutes=60; initial_token=null; inter_dc_stream_throughput_outbound_megabits_per_sec=200; inter_dc_tcp_nodelay=false; internode_authenticator=null; internode_compression=dc; internode_recv_buff_size_in_bytes=0; internode_send_buff_size_in_bytes=0; key_cache_keys_to_save=2147483647; key_cache_save_period=14400; key_cache_size_in_mb=null; listen_address=172.17.0.8; listen_interface=null; listen_interface_prefer_ipv6=false; listen_on_broadcast_address=false; max_hint_window_in_ms=10800000; max_hints_delivery_threads=2; max_hints_file_size_in_mb=128; max_mutation_size_in_kb=null; max_streaming_retries=3; max_value_size_in_mb=256; memtable_allocation_type=heap_buffers; memtable_cleanup_threshold=null; memtable_flush_writers=0; memtable_heap_space_in_mb=null; memtable_offheap_space_in_mb=null; min_free_space_per_drive_in_mb=50; native_transport_max_concurrent_connections=-1; native_transport_max_concurrent_connections_per_ip=-1; native_transport_max_frame_size_in_mb=256; native_transport_max_threads=128; native_transport_port=9042; native_transport_port_ssl=null; num_tokens=256; otc_backlog_expiration_interval_ms=200; otc_coalescing_enough_coalesced_messages=8; otc_coalescing_strategy=DISABLED; otc_coalescing_window_us=200; partitioner=org.apache.cassandra.dht.Murmur3Partitioner; permissions_cache_max_entries=1000; permissions_update_interval_in_ms=-1; permissions_validity_in_ms=2000; phi_convict_threshold=8.0; prepared_statements_cache_size_mb=null; range_request_timeout_in_ms=10000; read_request_timeout_in_ms=5000; request_scheduler=org.apache.cassandra.scheduler.NoScheduler; request_scheduler_id=null; request_scheduler_options=null; request_timeout_in_ms=10000; role_manager=CassandraRoleManager; roles_cache_max_entries=1000; roles_update_interval_in_ms=-1; roles_validity_in_ms=2000; row_cache_class_name=org.apache.cassandra.cache.OHCProvider; row_cache_keys_to_save=2147483647; row_cache_save_period=0; row_cache_size_in_mb=0; rpc_address=0.0.0.0; rpc_interface=null; rpc_interface_prefer_ipv6=false; rpc_keepalive=true; rpc_listen_backlog=50; rpc_max_threads=2147483647; rpc_min_threads=16; rpc_port=9160; rpc_recv_buff_size_in_bytes=null; rpc_send_buff_size_in_bytes=null; rpc_server_type=sync; saved_caches_directory=/var/lib/cassandra/saved_caches; seed_provider=org.apache.cassandra.locator.SimpleSeedProvider{seeds=cassandra-0.cassandra}; server_encryption_options=<REDACTED>; slow_query_log_timeout_in_ms=500; snapshot_before_compaction=false; ssl_storage_port=7001; sstable_preemptive_open_interval_in_mb=50; start_native_transport=true; start_rpc=false; storage_port=7000; stream_throughput_outbound_megabits_per_sec=200; streaming_keep_alive_period_in_secs=300; streaming_socket_timeout_in_ms=86400000; thrift_framed_transport_size_in_mb=15; thrift_max_message_length_in_mb=16; thrift_prepared_statements_cache_size_mb=null; tombstone_failure_threshold=100000; tombstone_warn_threshold=1000; tracetype_query_ttl=86400; tracetype_repair_ttl=604800; transparent_data_encryption_options=org.apache.cassandra.config.TransparentDataEncryptionOptions@2b6856dd; trickle_fsync=false; trickle_fsync_interval_in_kb=10240; truncate_request_timeout_in_ms=60000; unlogged_batch_across_partitions_warn_threshold=10; user_defined_function_fail_timeout=1500; user_defined_function_warn_timeout=500; user_function_timeout_policy=die; windows_timer_interval=1; write_request_timeout_in_ms=2000]
INFO  [main] 2017-07-27 09:22:56,213 DatabaseDescriptor.java:366 - DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
INFO  [main] 2017-07-27 09:22:56,213 DatabaseDescriptor.java:420 - Global memtable on-heap threshold is enabled at 125MB
INFO  [main] 2017-07-27 09:22:56,214 DatabaseDescriptor.java:424 - Global memtable off-heap threshold is enabled at 125MB
WARN  [main] 2017-07-27 09:22:56,220 DatabaseDescriptor.java:466 - Small commitlog volume detected at /var/lib/cassandra/commitlog; setting commitlog_total_space_in_mb to 4576.  You can override this in cassandra.yaml
WARN  [main] 2017-07-27 09:22:56,221 DatabaseDescriptor.java:493 - Small cdc volume detected at /var/lib/cassandra/cdc_raw; setting cdc_total_space_in_mb to 2288.  You can override this in cassandra.yaml
WARN  [main] 2017-07-27 09:22:56,324 DatabaseDescriptor.java:540 - Only 15.869GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
INFO  [main] 2017-07-27 09:22:56,350 RateBasedBackPressure.java:123 - Initialized back-pressure with high ratio: 0.9, factor: 5, flow: FAST, window size: 2000.
INFO  [main] 2017-07-27 09:22:56,350 DatabaseDescriptor.java:710 - Back-pressure is disabled with strategy org.apache.cassandra.net.RateBasedBackPressure{high_ratio=0.9, factor=5, flow=FAST}.
INFO  [main] 2017-07-27 09:22:56,500 GossipingPropertyFileSnitch.java:64 - Loaded cassandra-topology.properties for compatibility
INFO  [main] 2017-07-27 09:22:56,593 JMXServerUtils.java:249 - Configured JMX server at: service:jmx:rmi://127.0.0.1/jndi/rmi://127.0.0.1:7199/jmxrmi
INFO  [main] 2017-07-27 09:22:56,600 CassandraDaemon.java:471 - Hostname: cassandra-2.cassandra.jaeger.svc.cluster.local
INFO  [main] 2017-07-27 09:22:56,601 CassandraDaemon.java:478 - JVM vendor/version: OpenJDK 64-Bit Server VM/1.8.0_131
INFO  [main] 2017-07-27 09:22:56,603 CassandraDaemon.java:479 - Heap size: 502.000MiB/502.000MiB
INFO  [main] 2017-07-27 09:22:56,605 CassandraDaemon.java:484 - Code Cache Non-heap memory: init = 2555904(2496K) used = 4707712(4597K) committed = 4784128(4672K) max = 251658240(245760K)
INFO  [main] 2017-07-27 09:22:56,606 CassandraDaemon.java:484 - Metaspace Non-heap memory: init = 0(0K) used = 18596224(18160K) committed = 19136512(18688K) max = -1(-1K)
INFO  [main] 2017-07-27 09:22:56,607 CassandraDaemon.java:484 - Compressed Class Space Non-heap memory: init = 0(0K) used = 2235656(2183K) committed = 2359296(2304K) max = 1073741824(1048576K)
INFO  [main] 2017-07-27 09:22:56,607 CassandraDaemon.java:484 - Par Eden Space Heap memory: init = 83886080(81920K) used = 83886080(81920K) committed = 83886080(81920K) max = 83886080(81920K)
INFO  [main] 2017-07-27 09:22:56,607 CassandraDaemon.java:484 - Par Survivor Space Heap memory: init = 10485760(10240K) used = 10485760(10240K) committed = 10485760(10240K) max = 10485760(10240K)
INFO  [main] 2017-07-27 09:22:56,608 CassandraDaemon.java:484 - CMS Old Gen Heap memory: init = 432013312(421888K) used = 177912(173K) committed = 432013312(421888K) max = 432013312(421888K)
INFO  [main] 2017-07-27 09:22:56,608 CassandraDaemon.java:486 - Classpath: /etc/cassandra:/usr/share/cassandra/lib/HdrHistogram-2.1.9.jar:/usr/share/cassandra/lib/ST4-4.0.8.jar:/usr/share/cassandra/lib/airline-0.6.jar:/usr/share/cassandra/lib/antlr-runtime-3.5.2.jar:/usr/share/cassandra/lib/asm-5.0.4.jar:/usr/share/cassandra/lib/caffeine-2.2.6.jar:/usr/share/cassandra/lib/cassandra-driver-core-3.0.1-shaded.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.9.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/commons-math3-3.2.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrent-trees-2.4.0.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.4.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/ecj-4.4.2.jar:/usr/share/cassandra/lib/guava-18.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.0.6.jar:/usr/share/cassandra/lib/hppc-0.5.4.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.3.0.jar:/usr/share/cassandra/lib/javax.inject.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jcl-over-slf4j-1.7.7.jar:/usr/share/cassandra/lib/jctools-core-1.2.1.jar:/usr/share/cassandra/lib/jflex-1.6.0.jar:/usr/share/cassandra/lib/jna-4.4.0.jar:/usr/share/cassandra/lib/joda-time-2.4.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/jstackjunit-0.0.1.jar:/usr/share/cassandra/lib/libthrift-0.9.2.jar:/usr/share/cassandra/lib/log4j-over-slf4j-1.7.7.jar:/usr/share/cassandra/lib/logback-classic-1.1.3.jar:/usr/share/cassandra/lib/logback-core-1.1.3.jar:/usr/share/cassandra/lib/lz4-1.3.0.jar:/usr/share/cassandra/lib/metrics-core-3.1.0.jar:/usr/share/cassandra/lib/metrics-jvm-3.1.0.jar:/usr/share/cassandra/lib/metrics-logback-3.1.0.jar:/usr/share/cassandra/lib/netty-all-4.0.44.Final.jar:/usr/share/cassandra/lib/ohc-core-0.4.4.jar:/usr/share/cassandra/lib/ohc-core-j8-0.4.4.jar:/usr/share/cassandra/lib/reporter-config-base-3.0.3.jar:/usr/share/cassandra/lib/reporter-config3-3.0.3.jar:/usr/share/cassandra/lib/sigar-1.6.4.jar:/usr/share/cassandra/lib/slf4j-api-1.7.7.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.1.1.7.jar:/usr/share/cassandra/lib/snowball-stemmer-1.3.0.581.1.jar:/usr/share/cassandra/lib/stream-2.5.2.jar:/usr/share/cassandra/lib/thrift-server-0.3.7.jar:/usr/share/cassandra/apache-cassandra-3.11.0.jar:/usr/share/cassandra/apache-cassandra-thrift-3.11.0.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar::/usr/share/cassandra/lib/jamm-0.3.0.jar
INFO  [main] 2017-07-27 09:22:56,609 CassandraDaemon.java:488 - JVM Arguments: [-Xloggc:/var/log/cassandra/gc.log, -ea, -XX:+UseThreadPriorities, -XX:ThreadPriorityPolicy=42, -XX:+HeapDumpOnOutOfMemoryError, -Xss256k, -XX:StringTableSize=1000003, -XX:+AlwaysPreTouch, -XX:-UseBiasedLocking, -XX:+UseTLAB, -XX:+ResizeTLAB, -XX:+UseNUMA, -XX:+PerfDisableSharedMem, -Djava.net.preferIPv4Stack=true, -XX:+UseParNewGC, -XX:+UseConcMarkSweepGC, -XX:+CMSParallelRemarkEnabled, -XX:SurvivorRatio=8, -XX:MaxTenuringThreshold=1, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:CMSWaitDuration=10000, -XX:+CMSParallelInitialMarkEnabled, -XX:+CMSEdenChunksRecordAlways, -XX:+CMSClassUnloadingEnabled, -XX:+PrintGCDetails, -XX:+PrintGCDateStamps, -XX:+PrintHeapAtGC, -XX:+PrintTenuringDistribution, -XX:+PrintGCApplicationStoppedTime, -XX:+PrintPromotionFailure, -XX:+UseGCLogFileRotation, -XX:NumberOfGCLogFiles=10, -XX:GCLogFileSize=10M, -Xms512M, -Xmx512M, -Xmn100M, -XX:CompileCommandFile=/etc/cassandra/hotspot_compiler, -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar, -Dcassandra.jmx.local.port=7199, -Dcom.sun.management.jmxremote.authenticate=false, -Dcom.sun.management.jmxremote.password.file=/etc/cassandra/jmxremote.password, -Djava.library.path=/usr/share/cassandra/lib/sigar-bin, -Dcassandra.libjemalloc=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1, -Dlogback.configurationFile=logback.xml, -Dcassandra.logdir=/var/log/cassandra, -Dcassandra.storagedir=/var/lib/cassandra, -Dcassandra-foreground=yes]
WARN  [main] 2017-07-27 09:22:56,668 NativeLibrary.java:187 - Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out, especially with mmapped I/O enabled. Increase RLIMIT_MEMLOCK or run Cassandra as root.
INFO  [main] 2017-07-27 09:22:56,673 StartupChecks.java:131 - jemalloc seems to be preloaded from /usr/lib/x86_64-linux-gnu/libjemalloc.so.1
WARN  [main] 2017-07-27 09:22:56,674 StartupChecks.java:160 - JMX is not enabled to receive remote connections. Please see cassandra-env.sh for more info.
WARN  [main] 2017-07-27 09:22:56,675 StartupChecks.java:197 - OpenJDK is not recommended. Please upgrade to the newest Oracle Java release
INFO  [main] 2017-07-27 09:22:56,677 SigarLibrary.java:44 - Initializing SIGAR library
WARN  [main] 2017-07-27 09:22:56,686 SigarLibrary.java:174 - Cassandra server running in degraded mode. Is swap disabled? : false,  Address space adequate? : true,  nofile limit adequate? : true, nproc limit adequate? : true 
WARN  [main] 2017-07-27 09:22:56,688 StartupChecks.java:265 - Maximum number of memory map areas per process (vm.max_map_count) 65530 is too low, recommended value: 1048575, you can change it with sysctl.
INFO  [main] 2017-07-27 09:22:56,872 QueryProcessor.java:115 - Initialized prepared statement caches with 10 MB (native) and 10 MB (Thrift)
INFO  [main] 2017-07-27 09:22:57,326 ColumnFamilyStore.java:406 - Initializing system.IndexInfo
INFO  [main] 2017-07-27 09:22:58,121 ColumnFamilyStore.java:406 - Initializing system.batches
INFO  [main] 2017-07-27 09:22:58,136 ColumnFamilyStore.java:406 - Initializing system.paxos
INFO  [main] 2017-07-27 09:22:58,146 ColumnFamilyStore.java:406 - Initializing system.local
INFO  [SSTableBatchOpen:1] 2017-07-27 09:22:58,171 BufferPool.java:230 - Global buffer pool is enabled, when pool is exhausted (max is 125.000MiB) it will allocate on heap
INFO  [main] 2017-07-27 09:22:58,411 CacheService.java:112 - Initializing key cache with capacity of 25 MBs.
INFO  [main] 2017-07-27 09:22:58,421 CacheService.java:134 - Initializing row cache with capacity of 0 MBs
INFO  [main] 2017-07-27 09:22:58,424 CacheService.java:163 - Initializing counter cache with capacity of 12 MBs
INFO  [main] 2017-07-27 09:22:58,425 CacheService.java:174 - Scheduling counter cache save to every 7200 seconds (going to save all keys).
INFO  [main] 2017-07-27 09:22:58,449 ColumnFamilyStore.java:406 - Initializing system.peers
INFO  [main] 2017-07-27 09:22:58,459 ColumnFamilyStore.java:406 - Initializing system.peer_events
INFO  [main] 2017-07-27 09:22:58,468 ColumnFamilyStore.java:406 - Initializing system.range_xfers
INFO  [main] 2017-07-27 09:22:58,480 ColumnFamilyStore.java:406 - Initializing system.compaction_history
INFO  [main] 2017-07-27 09:22:58,520 ColumnFamilyStore.java:406 - Initializing system.sstable_activity
INFO  [main] 2017-07-27 09:22:58,547 ColumnFamilyStore.java:406 - Initializing system.size_estimates
INFO  [main] 2017-07-27 09:22:58,554 ColumnFamilyStore.java:406 - Initializing system.available_ranges
INFO  [main] 2017-07-27 09:22:58,564 ColumnFamilyStore.java:406 - Initializing system.transferred_ranges
INFO  [main] 2017-07-27 09:22:58,573 ColumnFamilyStore.java:406 - Initializing system.views_builds_in_progress
INFO  [main] 2017-07-27 09:22:58,580 ColumnFamilyStore.java:406 - Initializing system.built_views
INFO  [main] 2017-07-27 09:22:58,585 ColumnFamilyStore.java:406 - Initializing system.hints
INFO  [main] 2017-07-27 09:22:58,592 ColumnFamilyStore.java:406 - Initializing system.batchlog
INFO  [main] 2017-07-27 09:22:58,599 ColumnFamilyStore.java:406 - Initializing system.prepared_statements
INFO  [main] 2017-07-27 09:22:58,604 ColumnFamilyStore.java:406 - Initializing system.schema_keyspaces
INFO  [main] 2017-07-27 09:22:58,609 ColumnFamilyStore.java:406 - Initializing system.schema_columnfamilies
INFO  [main] 2017-07-27 09:22:58,613 ColumnFamilyStore.java:406 - Initializing system.schema_columns
INFO  [main] 2017-07-27 09:22:58,618 ColumnFamilyStore.java:406 - Initializing system.schema_triggers
INFO  [main] 2017-07-27 09:22:58,624 ColumnFamilyStore.java:406 - Initializing system.schema_usertypes
INFO  [main] 2017-07-27 09:22:58,629 ColumnFamilyStore.java:406 - Initializing system.schema_functions
INFO  [main] 2017-07-27 09:22:58,636 ColumnFamilyStore.java:406 - Initializing system.schema_aggregates
INFO  [main] 2017-07-27 09:22:58,638 ViewManager.java:137 - Not submitting build tasks for views in keyspace system as storage service is not initialized
INFO  [main] 2017-07-27 09:22:58,761 ApproximateTime.java:44 - Scheduling approximate time-check task with a precision of 10 milliseconds
INFO  [main] 2017-07-27 09:23:00,429 StorageService.java:599 - Populating token metadata from system tables
INFO  [main] 2017-07-27 09:23:00,444 StorageService.java:606 - Token metadata: 
INFO  [main] 2017-07-27 09:23:00,451 ColumnFamilyStore.java:406 - Initializing system_schema.keyspaces
INFO  [main] 2017-07-27 09:23:00,474 ColumnFamilyStore.java:406 - Initializing system_schema.tables
INFO  [main] 2017-07-27 09:23:00,500 ColumnFamilyStore.java:406 - Initializing system_schema.columns
INFO  [main] 2017-07-27 09:23:00,511 ColumnFamilyStore.java:406 - Initializing system_schema.triggers
INFO  [main] 2017-07-27 09:23:00,527 ColumnFamilyStore.java:406 - Initializing system_schema.dropped_columns
INFO  [main] 2017-07-27 09:23:00,548 ColumnFamilyStore.java:406 - Initializing system_schema.views
INFO  [main] 2017-07-27 09:23:00,564 ColumnFamilyStore.java:406 - Initializing system_schema.types
INFO  [main] 2017-07-27 09:23:00,582 ColumnFamilyStore.java:406 - Initializing system_schema.functions
INFO  [main] 2017-07-27 09:23:00,598 ColumnFamilyStore.java:406 - Initializing system_schema.aggregates
INFO  [main] 2017-07-27 09:23:00,613 ColumnFamilyStore.java:406 - Initializing system_schema.indexes
INFO  [main] 2017-07-27 09:23:00,622 ViewManager.java:137 - Not submitting build tasks for views in keyspace system_schema as storage service is not initialized
INFO  [pool-3-thread-1] 2017-07-27 09:23:00,681 AutoSavingCache.java:173 - Completed loading (0 ms; 3 keys) KeyCache cache
INFO  [main] 2017-07-27 09:23:00,695 CommitLog.java:157 - Replaying /var/lib/cassandra/commitlog/CommitLog-6-1501147056964.log
INFO  [main] 2017-07-27 09:23:00,713 CommitLog.java:159 - Log replay complete, 0 replayed mutations
INFO  [main] 2017-07-27 09:23:00,713 StorageService.java:599 - Populating token metadata from system tables
INFO  [main] 2017-07-27 09:23:00,722 StorageService.java:606 - Token metadata: 
INFO  [main] 2017-07-27 09:23:00,793 QueryProcessor.java:162 - Preloaded 0 prepared statements
INFO  [main] 2017-07-27 09:23:00,795 StorageService.java:617 - Cassandra version: 3.11.0
INFO  [main] 2017-07-27 09:23:00,796 StorageService.java:618 - Thrift API version: 20.1.0
INFO  [main] 2017-07-27 09:23:00,797 StorageService.java:619 - CQL supported versions: 3.4.4 (default: 3.4.4)
INFO  [main] 2017-07-27 09:23:00,798 StorageService.java:621 - Native protocol supported versions: 3/v3, 4/v4, 5/v5-beta (default: 4/v4)
INFO  [main] 2017-07-27 09:23:00,857 IndexSummaryManager.java:85 - Initializing index summary manager with a memory pool size of 25 MB and a resize interval of 60 minutes
INFO  [main] 2017-07-27 09:23:00,878 MessagingService.java:753 - Starting Messaging Service on /172.17.0.8:7000 (eth0)
INFO  [main] 2017-07-27 09:23:00,958 OutboundTcpConnection.java:108 - OutboundTcpConnection using coalescing strategy DISABLED
INFO  [HANDSHAKE-cassandra-0.cassandra/172.17.0.2] 2017-07-27 09:23:00,970 OutboundTcpConnection.java:560 - Handshaking version with cassandra-0.cassandra/172.17.0.2
INFO  [ScheduledTasks:1] 2017-07-27 09:23:01,503 TokenMetadata.java:498 - Updating topology for all endpoints that have changed
Exception (java.lang.RuntimeException) encountered during startup: A node with address /172.17.0.8 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.
java.lang.RuntimeException: A node with address /172.17.0.8 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.
    at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:557)
    at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:801)
    at org.apache.cassandra.service.StorageService.initServer(StorageService.java:666)
    at org.apache.cassandra.service.StorageService.initServer(StorageService.java:612)
    at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:393)
    at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:600)
    at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:689)
ERROR [main] 2017-07-27 09:23:01,979 CassandraDaemon.java:706 - Exception encountered during startup
java.lang.RuntimeException: A node with address /172.17.0.8 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.
    at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:557) ~[apache-cassandra-3.11.0.jar:3.11.0]
    at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:801) ~[apache-cassandra-3.11.0.jar:3.11.0]
    at org.apache.cassandra.service.StorageService.initServer(StorageService.java:666) ~[apache-cassandra-3.11.0.jar:3.11.0]
    at org.apache.cassandra.service.StorageService.initServer(StorageService.java:612) ~[apache-cassandra-3.11.0.jar:3.11.0]
    at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:393) [apache-cassandra-3.11.0.jar:3.11.0]
    at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:600) [apache-cassandra-3.11.0.jar:3.11.0]
    at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:689) [apache-cassandra-3.11.0.jar:3.11.0]
INFO  [StorageServiceShutdownHook] 2017-07-27 09:23:01,984 HintsService.java:220 - Paused hints dispatch
WARN  [StorageServiceShutdownHook] 2017-07-27 09:23:01,985 Gossiper.java:1538 - No local state, state is in silent shutdown, or node hasn't joined, not announcing shutdown
INFO  [StorageServiceShutdownHook] 2017-07-27 09:23:01,985 MessagingService.java:984 - Waiting for messaging service to quiesce
INFO  [ACCEPT-/172.17.0.8] 2017-07-27 09:23:01,988 MessagingService.java:1338 - MessagingService has terminated the accept() thread
INFO  [StorageServiceShutdownHook] 2017-07-27 09:23:03,940 HintsService.java:220 - Paused hints dispatch
I have no name!@cassandra-0:/$ nodetool status                                                                                                                                               
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns (effective)  Host ID                               Rack
UN  172.17.0.2  118.5 KiB  256          65.9%             e1ed181c-ca69-47ff-8985-eb5fd37d67bb  rack1
DN  172.17.0.9  176.17 KiB  256          66.0%             3304c695-b0d4-4dd5-af79-99b97bea9942  rack1
DN  172.17.0.8  112.8 KiB  256          68.1%             de4c1b8d-178d-449e-8418-daef3c89ebdb  rack1

related issues: https://github.com/kubernetes/kubernetes/issues/24030#issuecomment-210197450 https://github.com/kubernetes/kubernetes/issues/34978#issue-183517949

pavolloffay commented 7 years ago

I am trying C* deployment from k8s: https://github.com/kubernetes/examples/blob/master/cassandra/README.md

I could not use this C* because image jaegertracing/jaeger-cassandra-schema uses different version of cql:

Connection error: ('Unable to connect to any servers', {'172.17.0.6': ProtocolError("cql_version '3.4.0' is not supported by remote (w/ native protocol). Supported versions: [u'3.4.2']",), '172.17.0.4': ProtocolError("cql_version '3.4.0' is not supported by remote (w/ native protocol). Supported versions: [u'3.4.2']",), '172.17.0.5': ProtocolError("cql_version '3.4.0' is not supported by remote (w/ native protocol). Supported versions: [u'3.4.2']",)})

Deleting of any pod worked:

kubectl delete po/cassandra-0
kubectl exec -it cassandra-0 -- nodetool status                                                                       6:07 
Datacenter: DC1-K8Demo
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns (effective)  Host ID                               Rack
UN  172.17.0.5  83.18 KiB  32           76.9%             de54c5c1-17db-4a84-bdf4-afb459c83576  Rack1-K8Demo
UN  172.17.0.4  109.09 KiB  32           65.2%             79f692db-5b4b-4111-89fc-3df7e9408ce3  Rack1-K8Demo
UN  172.17.0.6  102.25 KiB  32           58.0%             e30000e3-4332-434c-81c6-408a9d0671a4  Rack1-K8Demo

Scaling UP worked

kubectl scale sts cassandra --replicas=4
kubectl exec -it cassandra-0 -- nodetool status
Datacenter: DC1-K8Demo
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns (effective)  Host ID                               Rack
UN  172.17.0.5  83.18 KiB  32           53.9%             de54c5c1-17db-4a84-bdf4-afb459c83576  Rack1-K8Demo
UN  172.17.0.4  166.83 KiB  32           43.4%             79f692db-5b4b-4111-89fc-3df7e9408ce3  Rack1-K8Demo
UN  172.17.0.7  65.66 KiB  32           50.3%             67a30197-ed1a-4496-9884-c9ec0bb5347a  Rack1-K8Demo
UN  172.17.0.6  65.65 KiB  32           52.4%             e30000e3-4332-434c-81c6-408a9d0671a4  Rack1-K8Demo

Scaling DOWN did not worked

kubectl patch sts cassandra -p '{"spec":{"replicas":3}}'
kubectl exec -it cassandra-0 -- nodetool status
Datacenter: DC1-K8Demo
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns (effective)  Host ID                               Rack
UN  172.17.0.5  101.09 KiB  32           53.9%             de54c5c1-17db-4a84-bdf4-afb459c83576  Rack1-K8Demo
UN  172.17.0.4  166.83 KiB  32           43.4%             79f692db-5b4b-4111-89fc-3df7e9408ce3  Rack1-K8Demo
DN  172.17.0.7  65.66 KiB  32           50.3%             67a30197-ed1a-4496-9884-c9ec0bb5347a  Rack1-K8Demo
UN  172.17.0.6  65.65 KiB  32           52.4%             e30000e3-4332-434c-81c6-408a9d0671a4  Rack1-K8Demo
jpkrohling commented 7 years ago

Is this outcome different than what we have, or is it just a confirmation that we are doing the same as the reference is doing?

pavolloffay commented 7 years ago

By reference do you mean cassandra deployment from k8s examples? The outcome was to get more familiar with C* deployment on K8s and explore what works/does not work in our deployment (maybe for future improvements).

I strongly agree with you that currently, we should just say that our deployment has a limited functionality.

jpkrohling commented 7 years ago

By reference do you mean cassandra deployment from k8s examples?

Yes :) I'm just not sure what is the relevant part on that comment, as I wouldn't know how to compare that with the "expected" output, or with the output from our template.

pavolloffay commented 7 years ago
UN  172.17.0.5  83.18 KiB  32           53.9%             de54c5c1-17db-4a84-bdf4-afb459c83576  Rack1-K8Demo

If you delete a pod and it recovers as expected it should be UN when running nodetool status on all C* nodes. On the scaling down you can see that one did not recover properly (well it shouldn't be listed at all, but it is shown as DN).

hobbs commented 6 years ago

curious if you ever figured this out @pavolloffay, I'm experiencing the same and wondering how to scale down.

pavolloffay commented 6 years ago

@hobbs C template provided in this repo is not production ready, use other templates or helm charts to create scalable C deployment