errors from zipkin-query on restart

gadams00 commented 8 years ago

running docker-zipkin the first time works for me, but if I stop via CTRL-C, then run docker-compose up again, there seems to be a race condition between the zipkin-query and cassandra containers. I'm running greg@greg-elitebook ~/git/docker-zipkin $ docker-compose --version docker-compose version: 1.5.1 greg@greg-elitebook ~/git/docker-zipkin $ docker --version Docker version 1.9.1, build a34a1d5

greg@greg-elitebook ~/git/docker-zipkin $ docker-compose up
Starting dockerzipkin_cassandra_1
Starting dockerzipkin_collector_1
Starting dockerzipkin_query_1
Starting dockerzipkin_web_1
Attaching to dockerzipkin_cassandra_1, dockerzipkin_collector_1, dockerzipkin_query_1, dockerzipkin_web_1
cassandra_1 | CompilerOracle: inline org/apache/cassandra/db/AbstractNativeCell.compareTo (Lorg/apache/cassandra/db/composites/Composite;)I
cassandra_1 | CompilerOracle: inline org/apache/cassandra/db/composites/AbstractSimpleCellNameType.compareUnsigned (Lorg/apache/cassandra/db/composites/Composite;Lorg/apache/cassandra/db/composites/Composite;)I
cassandra_1 | CompilerOracle: inline org/apache/cassandra/io/util/Memory.checkBounds (JJ)V
cassandra_1 | CompilerOracle: inline org/apache/cassandra/io/util/SafeMemory.checkBounds (JJ)V
cassandra_1 | CompilerOracle: inline org/apache/cassandra/utils/AsymmetricOrdering.selectBoundary (Lorg/apache/cassandra/utils/AsymmetricOrdering/Op;II)I
cassandra_1 | CompilerOracle: inline org/apache/cassandra/utils/AsymmetricOrdering.strictnessOfLessThan (Lorg/apache/cassandra/utils/AsymmetricOrdering/Op;)I
cassandra_1 | CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compare (Ljava/nio/ByteBuffer;[B)I
cassandra_1 | CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compare ([BLjava/nio/ByteBuffer;)I
cassandra_1 | CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compareUnsigned (Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
cassandra_1 | CompilerOracle: inline org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo (Ljava/lang/Object;JILjava/lang/Object;JI)I
cassandra_1 | CompilerOracle: inline org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo (Ljava/lang/Object;JILjava/nio/ByteBuffer;)I
cassandra_1 | CompilerOracle: inline org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo (Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
collector_1 | Cassandra contact points: 172.17.0.2
collector_1 | ** Starting zipkin collector...
collector_1 | Nov 24, 2015 6:43:53 PM com.twitter.zipkin.collector.Main$ main
collector_1 | INFO: Loading configuration
query_1     | Cassandra contact points: 172.17.0.2
query_1     | ** Starting zipkin query...
web_1       | ** Starting zipkin web...
cassandra_1 | INFO  18:43:53 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
cassandra_1 | INFO  18:43:53 Global memtable on-heap threshold is enabled at 979MB
cassandra_1 | INFO  18:43:53 Global memtable off-heap threshold is enabled at 979MB
cassandra_1 | INFO  18:43:53 Hostname: d3685e64b5b2
cassandra_1 | INFO  18:43:53 JVM vendor/version: OpenJDK 64-Bit Server VM/1.8.0_60
cassandra_1 | INFO  18:43:53 Heap size: 4106223616/4106223616
cassandra_1 | INFO  18:43:53 Code Cache Non-heap memory: init = 2555904(2496K) used = 3041920(2970K) committed = 3080192(3008K) max = 251658240(245760K)
cassandra_1 | INFO  18:43:53 Metaspace Non-heap memory: init = 0(0K) used = 12597824(12302K) committed = 12976128(12672K) max = -1(-1K)
cassandra_1 | INFO  18:43:53 Compressed Class Space Non-heap memory: init = 0(0K) used = 1542160(1506K) committed = 1703936(1664K) max = 1073741824(1048576K)
cassandra_1 | INFO  18:43:53 Par Eden Space Heap memory: init = 671088640(655360K) used = 93952808(91750K) committed = 671088640(655360K) max = 671088640(655360K)
cassandra_1 | INFO  18:43:53 Par Survivor Space Heap memory: init = 83886080(81920K) used = 0(0K) committed = 83886080(81920K) max = 83886080(81920K)
cassandra_1 | INFO  18:43:53 CMS Old Gen Heap memory: init = 3351248896(3272704K) used = 0(0K) committed = 3351248896(3272704K) max = 3351248896(3272704K)
cassandra_1 | INFO  18:43:53 Classpath: /cassandra/bin/../conf:/cassandra/bin/../build/classes/main:/cassandra/bin/../build/classes/thrift:/cassandra/bin/../lib/ST4-4.0.8.jar:/cassandra/bin/../lib/airline-0.6.jar:/cassandra/bin/../lib/antlr-runtime-3.5.2.jar:/cassandra/bin/../lib/apache-cassandra-2.2.3.jar:/cassandra/bin/../lib/apache-cassandra-clientutil-2.2.3.jar:/cassandra/bin/../lib/apache-cassandra-thrift-2.2.3.jar:/cassandra/bin/../lib/cassandra-driver-core-2.2.0-rc2-SNAPSHOT-20150617-shaded.jar:/cassandra/bin/../lib/commons-cli-1.1.jar:/cassandra/bin/../lib/commons-codec-1.2.jar:/cassandra/bin/../lib/commons-lang3-3.1.jar:/cassandra/bin/../lib/commons-math3-3.2.jar:/cassandra/bin/../lib/compress-lzf-0.8.4.jar:/cassandra/bin/../lib/concurrentlinkedhashmap-lru-1.4.jar:/cassandra/bin/../lib/crc32ex-0.1.1.jar:/cassandra/bin/../lib/disruptor-3.0.1.jar:/cassandra/bin/../lib/ecj-4.4.2.jar:/cassandra/bin/../lib/guava-16.0.jar:/cassandra/bin/../lib/high-scale-lib-1.0.6.jar:/cassandra/bin/../lib/jackson-core-asl-1.9.2.jar:/cassandra/bin/../lib/jackson-mapper-asl-1.9.2.jar:/cassandra/bin/../lib/jamm-0.3.0.jar:/cassandra/bin/../lib/javax.inject.jar:/cassandra/bin/../lib/jbcrypt-0.3m.jar:/cassandra/bin/../lib/jcl-over-slf4j-1.7.7.jar:/cassandra/bin/../lib/jna-4.0.0.jar:/cassandra/bin/../lib/joda-time-2.4.jar:/cassandra/bin/../lib/json-simple-1.1.jar:/cassandra/bin/../lib/libthrift-0.9.2.jar:/cassandra/bin/../lib/log4j-over-slf4j-1.7.7.jar:/cassandra/bin/../lib/logback-classic-1.1.3.jar:/cassandra/bin/../lib/logback-core-1.1.3.jar:/cassandra/bin/../lib/lz4-1.3.0.jar:/cassandra/bin/../lib/metrics-core-3.1.0.jar:/cassandra/bin/../lib/metrics-logback-3.1.0.jar:/cassandra/bin/../lib/netty-all-4.0.23.Final.jar:/cassandra/bin/../lib/ohc-core-0.3.4.jar:/cassandra/bin/../lib/ohc-core-j8-0.3.4.jar:/cassandra/bin/../lib/reporter-config-base-3.0.0.jar:/cassandra/bin/../lib/reporter-config3-3.0.0.jar:/cassandra/bin/../lib/sigar-1.6.4.jar:/cassandra/bin/../lib/slf4j-api-1.7.7.jar:/cassandra/bin/../lib/snakeyaml-1.11.jar:/cassandra/bin/../lib/snappy-java-1.1.1.7.jar:/cassandra/bin/../lib/stream-2.5.2.jar:/cassandra/bin/../lib/super-csv-2.1.0.jar:/cassandra/bin/../lib/thrift-server-0.3.7.jar:/cassandra/bin/../lib/jsr223/*/*.jar:/cassandra/bin/../lib/jamm-0.3.0.jar
cassandra_1 | WARN  18:43:53 Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out, especially with mmapped I/O enabled. Increase RLIMIT_MEMLOCK or run Cassandra as root.
cassandra_1 | WARN  18:43:53 JMX is not enabled to receive remote connections. Please see cassandra-env.sh for more info.
cassandra_1 | WARN  18:43:53 OpenJDK is not recommended. Please upgrade to the newest Oracle Java release
cassandra_1 | INFO  18:43:53 Initializing SIGAR library
cassandra_1 | WARN  18:43:53 Cassandra server running in degraded mode. Is swap disabled? : false,  Address space adequate? : true,  nofile limit adequate? : true, nproc limit adequate? : true 
cassandra_1 | INFO  18:43:54 Initializing system.sstable_activity
web_1       | Nov 24, 2015 6:43:54 PM com.twitter.finagle.http.HttpMuxer$$anonfun$4 apply
web_1       | INFO: HttpMuxer[/admin/metrics.json] = com.twitter.finagle.stats.MetricsExporter(<function1>)
web_1       | Nov 24, 2015 6:43:54 PM com.twitter.finagle.http.HttpMuxer$$anonfun$4 apply
web_1       | INFO: HttpMuxer[/admin/per_host_metrics.json] = com.twitter.finagle.stats.HostMetricsExporter(<function1>)
web_1       | I 1124 18:43:55.035 THREAD1: /admin => com.twitter.server.handler.SummaryHandler
web_1       | I 1124 18:43:55.036 THREAD1: /admin/server_info => com.twitter.finagle.Filter$$anon$2
web_1       | I 1124 18:43:55.036 THREAD1: /admin/contention => com.twitter.finagle.Filter$$anon$2
web_1       | I 1124 18:43:55.067 THREAD1: /admin/lint.json => com.twitter.server.handler.LintHandler
web_1       | I 1124 18:43:55.067 THREAD1: /admin/lint => com.twitter.server.handler.LintHandler
web_1       | I 1124 18:43:55.067 THREAD1: /admin/threads => com.twitter.server.handler.ThreadsHandler
web_1       | I 1124 18:43:55.068 THREAD1: /admin/threads.json => com.twitter.server.handler.ThreadsHandler
web_1       | I 1124 18:43:55.068 THREAD1: /admin/announcer => com.twitter.finagle.Filter$$anon$2
web_1       | I 1124 18:43:55.068 THREAD1: /admin/dtab => com.twitter.finagle.Filter$$anon$2
web_1       | I 1124 18:43:55.068 THREAD1: /admin/pprof/heap => com.twitter.server.handler.HeapResourceHandler
web_1       | I 1124 18:43:55.068 THREAD1: /admin/pprof/profile => com.twitter.server.handler.ProfileResourceHandler
web_1       | I 1124 18:43:55.068 THREAD1: /admin/pprof/contention => com.twitter.server.handler.ProfileResourceHandler
web_1       | I 1124 18:43:55.069 THREAD1: /admin/shutdown => com.twitter.server.handler.ShutdownHandler
web_1       | I 1124 18:43:55.069 THREAD1: /admin/tracing => com.twitter.server.handler.TracingHandler
web_1       | I 1124 18:43:55.070 THREAD1: /admin/events/ => com.twitter.server.handler.EventRecordingHandler
web_1       | I 1124 18:43:55.070 THREAD1: /admin/events => com.twitter.server.handler.EventsHandler
web_1       | I 1124 18:43:55.069 THREAD1: /admin/ping => com.twitter.server.handler.ReplyHandler
web_1       | I 1124 18:43:55.082 THREAD1: /admin/clients/ => com.twitter.server.handler.ClientRegistryHandler
web_1       | I 1124 18:43:55.082 THREAD1: /admin/metrics => com.twitter.server.handler.MetricQueryHandler
web_1       | I 1124 18:43:55.083 THREAD1: /admin/servers/ => com.twitter.server.handler.ServerRegistryHandler
web_1       | I 1124 18:43:55.083 THREAD1: /admin/files/ => com.twitter.server.handler.ResourceHandler
web_1       | I 1124 18:43:55.083 THREAD1: /admin/registry.json => com.twitter.server.handler.RegistryHandler
web_1       | I 1124 18:43:55.086 THREAD1: Serving admin http on 0.0.0.0/0.0.0.0:9990
web_1       | I 1124 18:43:55.070 THREAD1: /admin/logging => com.twitter.server.handler.LoggingHandler
web_1       | I 1124 18:43:55.105 THREAD1: Finagle version 6.30.0 (rev=745578b931893c432e51da623287144e548cc489) built at 20151015-163818
cassandra_1 | INFO  18:43:55 Initializing key cache with capacity of 100 MBs.
cassandra_1 | INFO  18:43:55 Initializing row cache with capacity of 0 MBs
cassandra_1 | INFO  18:43:55 Initializing counter cache with capacity of 50 MBs
cassandra_1 | INFO  18:43:55 Scheduling counter cache save to every 7200 seconds (going to save all keys).
cassandra_1 | INFO  18:43:55 Initializing system.hints
cassandra_1 | INFO  18:43:55 Initializing system.compaction_history
cassandra_1 | INFO  18:43:55 Initializing system.peers
cassandra_1 | INFO  18:43:55 Initializing system.schema_columnfamilies
cassandra_1 | INFO  18:43:55 Initializing system.schema_functions
cassandra_1 | INFO  18:43:55 Initializing system.IndexInfo
cassandra_1 | INFO  18:43:55 Initializing system.schema_columns
cassandra_1 | INFO  18:43:55 Initializing system.schema_triggers
cassandra_1 | INFO  18:43:55 Initializing system.local
cassandra_1 | INFO  18:43:55 Initializing system.schema_usertypes
cassandra_1 | INFO  18:43:55 Initializing system.batchlog
cassandra_1 | INFO  18:43:55 Initializing system.available_ranges
cassandra_1 | INFO  18:43:55 Initializing system.schema_aggregates
web_1       | I 1124 18:43:55.856 THREAD1: Tracer: com.twitter.finagle.zipkin.thrift.SamplingTracer
cassandra_1 | INFO  18:43:55 Initializing system.paxos
cassandra_1 | INFO  18:43:55 Initializing system.peer_events
cassandra_1 | INFO  18:43:55 Initializing system.size_estimates
cassandra_1 | INFO  18:43:55 Initializing system.compactions_in_progress
cassandra_1 | INFO  18:43:55 Initializing system.schema_keyspaces
cassandra_1 | INFO  18:43:55 Initializing system.range_xfers
cassandra_1 | INFO  18:43:56 Initializing zipkin.span_duration_index
cassandra_1 | INFO  18:43:56 Initializing zipkin.span_names
cassandra_1 | INFO  18:43:56 Initializing zipkin.service_span_name_index
cassandra_1 | INFO  18:43:56 Initializing zipkin.traces
cassandra_1 | INFO  18:43:56 Initializing zipkin.service_names
cassandra_1 | INFO  18:43:56 Initializing zipkin.annotations_index
cassandra_1 | INFO  18:43:56 Initializing zipkin.service_name_index
cassandra_1 | INFO  18:43:56 Initializing zipkin.dependencies
cassandra_1 | INFO  18:43:56 Initializing system_distributed.parent_repair_history
cassandra_1 | INFO  18:43:56 Initializing system_distributed.repair_history
cassandra_1 | INFO  18:43:57 Initializing system_auth.role_permissions
cassandra_1 | INFO  18:43:57 Initializing system_auth.resource_role_permissons_index
cassandra_1 | INFO  18:43:57 Initializing system_auth.roles
cassandra_1 | INFO  18:43:57 Initializing system_auth.role_members
cassandra_1 | INFO  18:43:57 Initializing system_traces.sessions
cassandra_1 | INFO  18:43:57 Initializing system_traces.events
cassandra_1 | INFO  18:43:57 Completed loading (5 ms; 18 keys) KeyCache cache
cassandra_1 | INFO  18:43:57 Replaying /cassandra/bin/../data/commitlog/CommitLog-5-1448390330992.log, /cassandra/bin/../data/commitlog/CommitLog-5-1448390330993.log
cassandra_1 | INFO  18:43:57 Log replay complete, 50 replayed mutations
cassandra_1 | INFO  18:43:57 Cassandra version: 2.2.3
cassandra_1 | INFO  18:43:57 Thrift API version: 20.1.0
cassandra_1 | INFO  18:43:57 CQL supported versions: 3.3.1 (default: 3.3.1)
cassandra_1 | INFO  18:43:57 Initializing index summary manager with a memory pool size of 195 MB and a resize interval of 60 minutes
cassandra_1 | INFO  18:43:57 Loading persisted ring state
cassandra_1 | INFO  18:43:57 Starting up server gossip
cassandra_1 | INFO  18:43:58 Starting Messaging Service on port 7000
cassandra_1 | INFO  18:43:58 Using saved tokens [-1005564115229926819, -1031388840942697915, -1140177759946272161, -1234894594486811292, -1319265271788665595, -138419727228993398, -1438488610580630879, -1467300421285800562, -1500979997895511158, -1527227139147930528, -1597677766682107354, -1680246583249656280, -1759569376306831297, -1886863938717898264, -1915444714871394472, -1941718827689321284, -1945549122583837521, -1988177828870265436, -2006823838377827242, -2015661610815019761, -203150784164783199, -232319634009484686, -2329178707022255157, -2468309370217404916, -2484180943248657365, -2493560870485416552, -2604969093969860409, -264165822098668675, -2652051126711331013, -2825253391556337775, -2865675486685358444, -2885509087629765661, -2901654682882230829, -3189988182670232324, -3211082813971045390, -3328821490372770264, -3450257621897176901, -3521892111449220960, -3529005200857128131, -3616917933807864311, -3662555467247311560, -3814877711255926613, -3827818588403504120, -383199181167156152, -3874969728861943427, -391778865861433965, -3953206110024963507, -3975871099936343678, -4031497698427188629, -4169539845335155755, -4230627352442276744, -4252697509408379575, -4265596633160491795, -4295850472617529562, -4314163580812533645, -4490701447068485709, -4498581258988746709, -4510802848706198818, -468296742883305191, -4743113761369541150, -4797766586967944090, -4858427494707131345, -4953439611408527234, -4964873030617709098, -499618857633495221, -5063299734091843755, -5097497474452600066, -5174608723394955949, -5199969215573461000, -5235935721563199887, -5258523133481301883, -5321338943239664697, -5327589863354562433, -5331105831697007613, -5350717043844727127, -5403110763593115140, -5477538524762841323, -5628700789104949071, -5665553611309661473, -5668378625704686822, -5684959214911440416, -5731506353477798349, -5745159541269084455, -5764151763324161623, -5860000432802954051, -5884584869910241688, -5931753402771690589, -6005515591075704542, -6056461875380387740, -6146895113397396470, -6202938791209599981, -6283291885528227274, -6294885392849881130, -6459932482383291223, -6510272536808791220, -6686070995861025835, -6733864767165695131, -6979447822132752009, -7025350512042012262, -7037711667112305861, -7192807868039939863, -7304922347754676553, -7317994537095716480, -7504470487573424824, -7540437933841039656, -7585019537315182494, -7615119993476880643, -7654366429113096509, -7665696570607827754, -7677229989226531948, -7757137231472799247, -7864750582618713223, -7879382705861331817, -7969910356790100166, -7976189026866853214, -8005733849920754341, -8010829751862478299, -8242875137666423264, -8366349421534279056, -8366417854386344521, -8372630667772120795, -848963721729225177, -8519686189665052488, -8529887340960546533, -8581706541807184844, -8611568339315275418, -8639659110278218834, -8717930400390840553, -8795964657943783472, -9023236624645665795, -9038613909643480190, 102493519130026594, 1132390368587006010, 1261773697180410996, 1295312291986507234, 1341555725832256101, 1365562907580461914, 1376129316613965386, 1417448108384101870, 1442688230117897720, 1445381690004948080, 1449754924233693365, 1521641812581443366, 1533942725734713714, 1823901245119840487, 1881850108226940416, 1942367362982992633, 1965464524368161498, 2040653532865781230, 2081070439520758729, 2136203650683774861, 2146860054623753170, 2159188444224482520, 2189769731447385573, 2227761793054104209, 2399893933007220982, 2477229567059781919, 2548411734936869688, 2566989217596906028, 2608127297146576822, 2645146418662925226, 2684019735951965985, 2769637135884842459, 2778131528209199549, 2786634025780005845, 2883133595611972675, 2887955843343853128, 2921649703543449314, 2925336738469290488, 3012361259931620620, 315432289038035741, 3283155597130038075, 32943021033311943, 3326034676794255913, 3394584700574625844, 3424784935573969205, 3484273594334599429, 3509861980493647391, 3542731586109855106, 3753429344514702774, 3758212052996951136, 3825031363453798350, 3856850180566496224, 387357659372729960, 3916668055303715801, 3998083175214435544, 4047657921614052582, 4127722185242098876, 4164726062173243155, 4191084950211457142, 4419351923481706149, 4544371175045732685, 4553470645781284939, 4621433520171367849, 4636934512444417378, 479802361490771936, 4896329276955400801, 4909262106230914570, 4915650133965933870, 4936849944512430722, 4943429037807379253, 4950220537540022914, 4955608684855501357, 5038317835294379956, 5084939880511473499, 5138801618436967754, 5178229768592584042, 5196265782716029459, 5385701088814435596, 5484206106211166207, 5641427516530796089, 5785768836431380923, 5967740725601898475, 6085256284604842815, 6280833800483301396, 6347254769779083877, 6470992405022367387, 6524679145454616930, 6558577248954428750, 67415427235322533, 6799635844425693331, 6954170631416445483, 6968498588603083333, 6992014806268275687, 7033527454586228032, 7180294399267576938, 7243901880514582658, 7276009111330435898, 7333347155394230173, 7381528148968253219, 7440998720004987812, 7608747304788481589, 7617595434615156027, 7837357184125427940, 786798651458609277, 7873364885481494891, 7897608331932477575, 7936638055583272943, 7963032253037729525, 7972887586328029777, 8036252568729900619, 8096259999764829816, 8263394550110915420, 8321684450304552867, 8341552198764308209, 839323606425787292, 8490880459061284347, 8687521445051494667, 8735449087074099568, 9049392903176569878, 9074434209677064269, 9094331475407042891, 9116717370420347831, 91206972177929461, 9164130852107420041, 963313241461612111]
cassandra_1 | INFO  18:43:58 Node /172.17.0.2 state jump to normal
cassandra_1 | INFO  18:43:58 Waiting for gossip to settle before accepting client requests...
query_1     | Nov 24, 2015 6:43:59 PM java.util.logging.LogManager$RootLogger log
query_1     | SEVERE: Flag zipkin.store.cassandra.keyspace read before parse.
query_1     | Nov 24, 2015 6:43:59 PM java.util.logging.LogManager$RootLogger log
query_1     | SEVERE: Flag zipkin.store.cassandra.spanTTL read before parse.
query_1     | Nov 24, 2015 6:43:59 PM java.util.logging.LogManager$RootLogger log
query_1     | SEVERE: Flag zipkin.store.cassandra.indexTTL read before parse.
query_1     | Nov 24, 2015 6:43:59 PM java.util.logging.LogManager$RootLogger log
query_1     | SEVERE: Flag zipkin.store.cassandra.maxTraceCols read before parse.
collector_1 | Nov 24, 2015 6:43:59 PM com.twitter.ostrich.admin.BackgroundProcess start
collector_1 | INFO: Starting LatchedStatsListener
collector_1 | Nov 24, 2015 6:43:59 PM com.twitter.ostrich.admin.BackgroundProcess start
collector_1 | INFO: Starting TimeSeriesCollector
collector_1 | Nov 24, 2015 6:43:59 PM com.twitter.ostrich.admin.AdminHttpService start
collector_1 | INFO: Admin HTTP interface started on port 9900.
collector_1 | Nov 24, 2015 6:43:59 PM com.twitter.zipkin.collector.builder.CollectorServiceBuilder$$anonfun$apply$1 apply
collector_1 | INFO: Building store: <function0>
collector_1 | Nov 24, 2015 6:44:00 PM java.util.logging.LogManager$RootLogger log
collector_1 | SEVERE: Flag zipkin.store.cassandra.keyspace read before parse.
query_1     | Nov 24, 2015 6:44:00 PM com.twitter.finagle.http.HttpMuxer$$anonfun$4 apply
query_1     | INFO: HttpMuxer[/stats.json] = com.twitter.finagle.stats.OstrichExporter(<function1>)
query_1     | 18:44:00.037 [main] INFO  c.t.z.builder.QueryServiceBuilder - Process started
collector_1 | Nov 24, 2015 6:44:00 PM java.util.logging.LogManager$RootLogger log
collector_1 | SEVERE: Flag zipkin.store.cassandra.spanTTL read before parse.
collector_1 | Nov 24, 2015 6:44:00 PM java.util.logging.LogManager$RootLogger log
collector_1 | SEVERE: Flag zipkin.store.cassandra.indexTTL read before parse.
collector_1 | Nov 24, 2015 6:44:00 PM java.util.logging.LogManager$RootLogger log
collector_1 | SEVERE: Flag zipkin.store.cassandra.maxTraceCols read before parse.
collector_1 | Nov 24, 2015 6:44:00 PM com.twitter.finagle.Init$$anonfun$1 apply$mcV$sp
collector_1 | INFO: Finagle version 6.30.0 (rev=745578b931893c432e51da623287144e548cc489) built at 20151015-163818
query_1     | I 1124 18:44:00.148 THREAD1: /admin => com.twitter.server.handler.SummaryHandler
query_1     | I 1124 18:44:00.148 THREAD1: /admin/server_info => com.twitter.finagle.Filter$$anon$2
query_1     | I 1124 18:44:00.151 THREAD1: /admin/contention => com.twitter.finagle.Filter$$anon$2
query_1     | I 1124 18:44:00.161 THREAD1: /admin/lint.json => com.twitter.server.handler.LintHandler
query_1     | I 1124 18:44:00.160 THREAD1: /admin/lint => com.twitter.server.handler.LintHandler
query_1     | I 1124 18:44:00.161 THREAD1: /admin/threads.json => com.twitter.server.handler.ThreadsHandler
query_1     | I 1124 18:44:00.161 THREAD1: /admin/threads => com.twitter.server.handler.ThreadsHandler
query_1     | I 1124 18:44:00.162 THREAD1: /admin/pprof/heap => com.twitter.server.handler.HeapResourceHandler
query_1     | I 1124 18:44:00.162 THREAD1: /admin/pprof/profile => com.twitter.server.handler.ProfileResourceHandler
query_1     | I 1124 18:44:00.162 THREAD1: /admin/pprof/contention => com.twitter.server.handler.ProfileResourceHandler
query_1     | I 1124 18:44:00.162 THREAD1: /admin/ping => com.twitter.server.handler.ReplyHandler
query_1     | I 1124 18:44:00.162 THREAD1: /admin/shutdown => com.twitter.server.handler.ShutdownHandler
query_1     | I 1124 18:44:00.162 THREAD1: /admin/tracing => com.twitter.server.handler.TracingHandler
query_1     | I 1124 18:44:00.163 THREAD1: /admin/events => com.twitter.server.handler.EventsHandler
query_1     | I 1124 18:44:00.163 THREAD1: /admin/events/ => com.twitter.server.handler.EventRecordingHandler
query_1     | I 1124 18:44:00.163 THREAD1: /admin/logging => com.twitter.server.handler.LoggingHandler
query_1     | I 1124 18:44:00.163 THREAD1: /admin/metrics => com.twitter.server.handler.MetricQueryHandler
query_1     | I 1124 18:44:00.163 THREAD1: /admin/clients/ => com.twitter.server.handler.ClientRegistryHandler
query_1     | I 1124 18:44:00.163 THREAD1: /admin/servers/ => com.twitter.server.handler.ServerRegistryHandler
query_1     | I 1124 18:44:00.163 THREAD1: /admin/files/ => com.twitter.server.handler.ResourceHandler
query_1     | I 1124 18:44:00.164 THREAD1: /admin/registry.json => com.twitter.server.handler.RegistryHandler
query_1     | I 1124 18:44:00.192 THREAD1: Serving admin http on 0.0.0.0/0.0.0.0:9901
query_1     | I 1124 18:44:00.162 THREAD1: /admin/dtab => com.twitter.finagle.Filter$$anon$2
query_1     | I 1124 18:44:00.162 THREAD1: /admin/announcer => com.twitter.finagle.Filter$$anon$2
query_1     | I 1124 18:44:00.403 THREAD1: Finagle version 6.30.0 (rev=745578b931893c432e51da623287144e548cc489) built at 20151015-163818
query_1     | I 1124 18:44:00.838 THREAD1: Tracer: com.twitter.finagle.zipkin.thrift.SamplingTracer
query_1     | 18:44:01.238 [main] INFO  c.t.f.http.routing.FileResolver - Local file mode enabled
query_1     | 18:44:01.494 [main] INFO  c.t.z.builder.QueryServiceBuilder - Resolving Finagle clients before warmup
query_1     | 18:44:01.497 [main] INFO  com.twitter.finagle - zipkin-tracer resolved to Addr.Bound, current size=1
query_1     | 18:44:01.500 [main] INFO  c.t.z.builder.QueryServiceBuilder - Done resolving clients: [zipkin-tracer].
query_1     | 18:44:01.501 [main] INFO  c.t.s.internal.FinagleBuildRevision$ - Resolved Finagle build revision: (rev=745578b931893c432e51da623287144e548cc489)
query_1     | 18:44:01.753 [main] INFO  c.t.z.builder.QueryServiceBuilder - Warming up.
query_1     | 18:44:01.973 [main] INFO  c.t.finatra.http.routing.HttpRouter - Adding routes
query_1     | POST    /api/v1/spans
query_1     | GET     /api/v1/spans
query_1     | GET     /api/v1/services
query_1     | GET     /api/v1/traces
query_1     | GET     /api/v1/trace/:id
query_1     | GET     /api/v1/dependencies
query_1     | 18:44:01.988 [main] INFO  c.t.z.builder.QueryServiceBuilder - http server started on port: 9411
query_1     | 18:44:02.045 [main] ERROR  - Flag zipkin.store.cassandra.keyspace read before parse.
query_1     | 18:44:02.123 [main] INFO  com.datastax.driver.core.NettyUtil - Did not find Netty's native epoll transport in the classpath, defaulting to NIO.
query_1     | Exception in thread "main" com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /172.17.0.2:9042 (com.datastax.driver.core.TransportException: [/172.17.0.2:9042] Cannot connect))
query_1     |   at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:227)
query_1     |   at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:86)
query_1     |   at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1409)
query_1     |   at com.datastax.driver.core.Cluster.init(Cluster.java:160)
query_1     |   at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:338)
query_1     |   at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:311)
query_1     |   at com.datastax.driver.core.Cluster.connect(Cluster.java:250)
query_1     |   at org.twitter.zipkin.storage.cassandra.Repository$Schema.ensureExists(Repository.java:906)
query_1     |   at org.twitter.zipkin.storage.cassandra.Repository.<init>(Repository.java:111)
query_1     |   at com.twitter.zipkin.cassandra.CassandraSpanStoreFactory$class.com$twitter$zipkin$cassandra$CassandraSpanStoreFactory$$lazyRepository(CassandraSpanStoreFactory.scala:48)
query_1     |   at Evaluator__f4344435a0639cf971e61d313b032bc3562c9740_1949618564$Factory$2$.com$twitter$zipkin$cassandra$CassandraSpanStoreFactory$$lazyRepository$lzycompute((inline):27)
query_1     |   at Evaluator__f4344435a0639cf971e61d313b032bc3562c9740_1949618564$Factory$2$.com$twitter$zipkin$cassandra$CassandraSpanStoreFactory$$lazyRepository((inline):27)
query_1     |   at com.twitter.zipkin.cassandra.CassandraSpanStoreFactory$$anon$1.repository$lzycompute(CassandraSpanStoreFactory.scala:52)
query_1     |   at com.twitter.zipkin.cassandra.CassandraSpanStoreFactory$$anon$1.repository(CassandraSpanStoreFactory.scala:52)
query_1     |   at com.twitter.zipkin.storage.cassandra.CassandraSpanStore$$anonfun$apply$11.apply(CassandraSpanStore.scala:235)
query_1     |   at com.twitter.zipkin.storage.cassandra.CassandraSpanStore$$anonfun$apply$11.apply(CassandraSpanStore.scala:230)
query_1     |   at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
query_1     |   at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
query_1     |   at scala.collection.immutable.List.foreach(List.scala:381)
query_1     |   at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
query_1     |   at scala.collection.immutable.List.map(List.scala:285)
query_1     |   at com.twitter.zipkin.storage.cassandra.CassandraSpanStore.apply(CassandraSpanStore.scala:230)
query_1     |   at com.twitter.finagle.zipkin.thrift.SpanStoreZipkinTracer.logSpans(SpanStoreZipkinTracer.scala:49)
query_1     |   at com.twitter.finagle.zipkin.thrift.RawZipkinTracer$$anonfun$3.apply(RawZipkinTracer.scala:108)
query_1     |   at com.twitter.finagle.zipkin.thrift.RawZipkinTracer$$anonfun$3.apply(RawZipkinTracer.scala:108)
query_1     |   at com.twitter.finagle.zipkin.thrift.DeadlineSpanMap.update(DeadlineSpanMap.scala:51)
query_1     |   at com.twitter.finagle.zipkin.thrift.RawZipkinTracer.annotate(RawZipkinTracer.scala:310)
query_1     |   at com.twitter.finagle.zipkin.thrift.RawZipkinTracer.record(RawZipkinTracer.scala:225)
query_1     |   at com.twitter.zipkin.query.BootstrapTrace$$anonfun$complete$1.apply(BootstrapTrace.scala:26)
query_1     |   at com.twitter.zipkin.query.BootstrapTrace$$anonfun$complete$1.apply(BootstrapTrace.scala:26)
query_1     |   at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
query_1     |   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
query_1     |   at com.twitter.zipkin.query.BootstrapTrace$.complete(BootstrapTrace.scala:26)
query_1     |   at com.twitter.zipkin.builder.QueryServiceBuilder.postWarmup(QueryServiceBuilder.scala:47)
query_1     |   at com.twitter.inject.app.App$class.main(App.scala:50)
query_1     |   at com.twitter.zipkin.query.ZipkinQueryServer.com$twitter$inject$server$TwitterServer$$super$main(ZipkinQueryServer.scala:37)
query_1     |   at com.twitter.inject.server.TwitterServer$class.main(TwitterServer.scala:40)
query_1     |   at com.twitter.zipkin.query.ZipkinQueryServer.main(ZipkinQueryServer.scala:37)
query_1     |   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
query_1     |   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
query_1     |   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
query_1     |   at java.lang.reflect.Method.invoke(Method.java:497)
query_1     |   at com.twitter.app.App$$anonfun$nonExitingMain$3.apply(App.scala:168)
query_1     |   at com.twitter.app.App$$anonfun$nonExitingMain$3.apply(App.scala:167)
query_1     |   at scala.Option.foreach(Option.scala:257)
query_1     |   at com.twitter.app.App$class.nonExitingMain(App.scala:167)
query_1     |   at com.twitter.zipkin.query.ZipkinQueryServer.nonExitingMain(ZipkinQueryServer.scala:37)
query_1     |   at com.twitter.zipkin.query.Main$.main(Main.scala:44)
query_1     |   at com.twitter.zipkin.query.Main.main(Main.scala)
dockerzipkin_query_1 exited with code 1
cassandra_1 | INFO  18:44:06 No gossip backlog; proceeding
cassandra_1 | INFO  18:44:06 Netty using native Epoll event loop
cassandra_1 | INFO  18:44:06 Using Netty Version: [netty-buffer=netty-buffer-4.0.23.Final.208198c, netty-codec=netty-codec-4.0.23.Final.208198c, netty-codec-http=netty-codec-http-4.0.23.Final.208198c, netty-codec-socks=netty-codec-socks-4.0.23.Final.208198c, netty-common=netty-common-4.0.23.Final.208198c, netty-handler=netty-handler-4.0.23.Final.208198c, netty-transport=netty-transport-4.0.23.Final.208198c, netty-transport-rxtx=netty-transport-rxtx-4.0.23.Final.208198c, netty-transport-sctp=netty-transport-sctp-4.0.23.Final.208198c, netty-transport-udt=netty-transport-udt-4.0.23.Final.208198c]
cassandra_1 | INFO  18:44:06 Starting listening for CQL clients on /0.0.0.0:9042...
cassandra_1 | INFO  18:44:06 Binding thrift service to /0.0.0.0:9160
cassandra_1 | INFO  18:44:06 Listening for thrift clients...

codefromthecrypt commented 8 years ago

If the error is fatal, then this is a problem!

If the error is non-fatal (i.e. cluttering the logs), it is less of a problem. I think race conditions will always exist and we should make handling them gracefully vs attempting to solve with orchestration. Probably a loud stack trace in the logs isn't graceful :)

gadams00 commented 8 years ago

The error is fatal. Once in this condition, if I go to (docker ip):8080 and hit find traces, nothing is returned, or I'll occasionally get an error "Endpoint zipkin-query is marked down". To get zipkin working again, I have to run "docker-compose run query" in a separate process, then zipkin-web will return traces. This situation continues through multiple restarts of docker-zipkin, unless I remove all related docker images from my system and run docker-compose up again, starting with freshly downloaded images. At that point, the query component will start and I can find new traces that get created after that, but for all subsequent runs of docker-compose after the initial, zipkin-query will fail to start, preventing zipkin-web from finding traces.

solargatsby commented 8 years ago

I encounter the same error today, is anyone has a solution?

codefromthecrypt commented 8 years ago

This is related to how Finagle deals with repeated errors. Might be worth opening on OpenZipkin/zipkin wrt the "zipkin-query"

solargatsby commented 8 years ago

Oh that's great.

codefromthecrypt commented 8 years ago

ex. there's some advice here which may need to apply to that process

http://twitter.github.io/finagle/guide/FAQ.html#why-do-clients-see-com-twitter-finagle-failedfastexception-s

Finagle isn't used in the zipkin-java project, so you wouldn't see that error. although zipkin-java only supports mysql at the moment.

https://github.com/openzipkin/docker-zipkin-java

solargatsby commented 8 years ago

Thanks adriancole.

mikewrighton commented 8 years ago

I'm also getting this error when restarting. I've read through the comments here but I'm still not sure I understand the root cause, or solution? Is there some way this can be handled in docker compose?

codefromthecrypt commented 8 years ago

I'm starring this to follow through. Please nag me on gitter OpenZipkin/zipkin if there's no remedy by Monday

warroyo commented 8 years ago

I am also experiencing this exact issue. running "docker-compose run query" as @gadams00 did not fix the issue for me. doing a "docker restart {containerid} " starts the query server but i see the following logs from the query server.

19:29:49.269 [cluster1-reconnection-0] ERROR c.d.driver.core.ControlConnection - [Control connection] Cannot connect to any host, scheduling retry in 1000 milliseconds

codefromthecrypt commented 8 years ago

This looks like an error connecting to Cassandra vs the other one which came from finagle. Same class of errors I admit. I can't look at this until Monday but maybe someone else can. Try pinging gitter OpenZipkin/zipkin? cc @kristofa in case you know any quick fixes off hand

codefromthecrypt commented 8 years ago

PS these destinations are passed by IP address and port. If restarting changes either they wouldn't connect for that reason

warroyo commented 8 years ago

Here is the top of the stack trace that i see before that error, it does seem to be trying to connect via ip and port

Exception in thread "main" com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /172.17.0.51:9042 (com.datastax.driver.core.TransportException: [/172.17.0.51:9042] Cannot connect))
query_1     |   at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:240)
query_1     |   at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:86)
query_1     |   at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1429)
query_1     |   at com.datastax.driver.core.Cluster.init(Cluster.java:162)
query_1     |   at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:341)
query_1     |   at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:314)
query_1     |   at com.datastax.driver.core.Cluster.connect(Cluster.java:252)
query_1     |   at org.twitter.zipkin.storage.cassandra.Repository$Schema.ensureExists(Repository.java:919)
query_1     |   at org.twitter.zipkin.storage.cassandra.Repository.<init>(Repository.java:111)

codefromthecrypt commented 8 years ago

So it would be interesting to see if you are able to connect to that port when this occurs, or if Cassandra is listening elsewhere. If the latter than it is a more general concern, which is to not use statically configured IP:port

Ex the docker setup in the base image includes dot files for assigning the destination for services. These could use host resolution or something less brittle instead.

codefromthecrypt commented 8 years ago

One last thing I hope can get to the bottom of this. Can you verify that when you get an error like this, that you can or cannot hit the socket directly? ex telnet 172.17.9.51 9042

Exception in thread "main" com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /172.17.0.51:9042 (com.datastax.driver.core.TransportException: [/172.17.0.51:9042]

codefromthecrypt commented 8 years ago

I'm going to spend some time on this now.. at least see if I can reproduce it

codefromthecrypt commented 8 years ago

ok playing around, when zipkin-query starts before cassandra is listening.. and something zipkin-query before cassandra is listening.. the process dies

codefromthecrypt commented 8 years ago

ok there are possibly two issues here.. one about zipkin-web -> zipkin-query and another between zipkin-query -> cassandra.

I can't reproduce the zipkin-web -> zipkin-query (Endpoint zipkin-query is marked down). I've tried various failures, and think it is probably best not to sink too much time into this, since zipkin-web is moving off finagle in favor of pure-javascript.

I can reproduce zipkin-query -> cassandra, and opened up an issue here: https://github.com/openzipkin/zipkin/issues/1007

warroyo commented 8 years ago

@adriancole I am able to telnet to the cassandra instance after the zipkin query container dies. I can get the query instance back up by running docker-compose run query, however from the UI is saying that it cannot connect to zipkin-query, and I am not able to query data.

warroyo commented 8 years ago

to update on this, I was able to get everything working again by running the following commands after zipkin starts up and the query container dies.

docker-compose restart query
docker-compose restart web

restarting query allows it to talk to casssandra and restarting web fixes the issue where web cannot talk to the query container.

it seems like this is just a startup order problem.

codefromthecrypt commented 8 years ago

what if we added runit config (like we do with kafka). That would restart the process if it died for any reason, including the race condition on cassandra.

https://github.com/openzipkin/docker-zipkin/blob/master/kafka/install.sh#L19

mikewrighton commented 8 years ago

Another solution I found was to rebuild the query and web images with a 10 second delay before starting the java process (in run.sh). Ugly, but seems to work.

codefromthecrypt commented 8 years ago

One thing we chatted about on gitter was using runit to watch the process (like we do in the Kafka image)

codefromthecrypt commented 8 years ago

this seems very repeatable with recent versions of docker

aeons commented 8 years ago

As I see it, the problem is that the cassandra container is seen as "up" once this happens:

cassandra | INFO  11:17:56 Node /172.18.0.2 state jump to normal
cassandra | INFO  11:17:56 Waiting for gossip to settle before accepting client requests...

but, cassandra is not ready for connections until this is logged:

cassandra | INFO  11:18:04 No gossip backlog; proceeding
cassandra | INFO  11:18:04 Netty using native Epoll event loop
cassandra | INFO  11:18:04 Using Netty Version: [netty-buffer=netty-buffer-4.0.23.Final.208198c, netty-codec=netty-codec-4.0.23.Final.208198c, netty-codec-http=netty-codec-http-4.0.23.Final.208198c, netty-codec-socks=netty-codec-socks-4.0.23.Final.208198c, netty-common=netty-common-4.0.23.Final.208198c, netty-handler=netty-handler-4.0.23.Final.208198c, netty-transport=netty-transport-4.0.23.Final.208198c, netty-transport-rxtx=netty-transport-rxtx-4.0.23.Final.208198c, netty-transport-sctp=netty-transport-sctp-4.0.23.Final.208198c, netty-transport-udt=netty-transport-udt-4.0.23.Final.208198c]
cassandra | INFO  11:18:04 Starting listening for CQL clients on /0.0.0.0:9042...
cassandra | INFO  11:18:04 Binding thrift service to /0.0.0.0:9160
cassandra | INFO  11:18:04 Listening for thrift clients...

Which happens after the query container fails to connect (obviously)

abesto commented 8 years ago

This can be handled from the Docker side of things. See https://github.com/docker/compose/issues/374 for an infinitely long discussion, started years ago and still going, on how. One way could be adding a wait_for_storage bash function to each storage profile, and calling it from query/run.sh. For Cassandra, it would look something like this:

tries=60
host=???
port=???
echo "Waiting at most $timeout seconds for Cassandra to start accepting CQL clients on port $port..."
while [ $tries -ge 0 ] && ! nc -z $host $port; do
    echo "Waiting $tries more seconds for port $host:$port to open up..."
    tries=$(expr $tries - 1)
    sleep 1
done

if [ $tries -eq -1 ]; then
    echo "$host$port is still not open, bailing out"
    exit 1
fi

codefromthecrypt commented 8 years ago

True. Sorry I have been criminally negligent on this one. I will fix the root issue now.

codefromthecrypt commented 8 years ago

the root problem was really silly. This crash was only present when tracing bootstrap! The below makes self-tracing bootstrap failures log instead of crash.

https://github.com/openzipkin/zipkin/pull/1082

codefromthecrypt commented 8 years ago

fixed in 1.39.2

abesto commented 8 years ago

Heh. Thanks for the fix. (I wouldn't say you were criminally negligent though)

openzipkin-attic / docker-zipkin

errors from zipkin-query on restart #74