Open alseddnm opened 6 years ago
how many service names do you have? how many spans do you have per day?
This might help someone help answer
@adriancole Just saw your message, Not even able to access our c nodes this morning.. I see a bunch of errors in c logs
Can't open index file at /cassandra/data/zipkin2/span-15bb5b006e7111e8a8d2af46ca93ec1b/mc-2673-big-SI_span_l_service_idx.db, skipping. org.apache.cassandra.io.FSReadError: java.io.EOFException at org.apache.cassandra.index.sasi.disk.OnDiskIndex.<init>(OnDiskIndex.java:164) ~[apache-cassandra-3.9.0.jar:3.9.0] at org.apache.cassandra.index.sasi.SSTableIndex.<init>(SSTableIndex.java:68) ~[apache-cassandra-
ERROR [Reference-Reaper:1] 2018-06-14 14:46:11,044 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@74bce35c) to class org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@2054886331:/cassandra/data/zipkin2/span-15bb5b006e7111e8a8d2af46ca93ec1b/mc-2673-big was not released before the reference was garbage collected
as of now we don't have much service names I see only 292 distinct service in the table total records are 404118 select count(*) from span_by_service;
404118
I'm no expert but it looks like you are using cassandra 3.9 which is likely to not work well. We use 3.11 and in fact the 3.11.3 will give the best results when released (which is shortly).
@alseddnm are you able to try with more recent versions to see if things work better?
same issue even with official cassandra docker image latest one
protip: adding comments to old issues about a troubleshooting scenario isn't usually something that results in an outcome. try poking on https://gitter.im/openzipkin/zipkin or including actual error message especially what "does" work for example if the /health endpoint works (which if not is a more fundamental problem)
We are using mesos/marathon to manage our docker containers, zipkin ran fine for 15 mins or less -> then heath check starts failing. we found a bunch of errors in our service log : cannot load service names: Request processing failed; nested exception is com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /xxxxx:9042 (com.datastax.driver.core.exceptions.TransportException: [/1xxxx:9042] Connection has been closed),/(com.datastax.driver.core.exceptions.TransportException: [xyz/10.124.8.97:9042] Connection has been closed))
I thought is better to open an issue, we are investigating on our side as well.
I did also notice zipkin cassandra is using SASI index and per datastax doc?
SASI indexes in DSE are experimental. DataStax does not support SASI indexes for production https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/useSASIIndex.html.