openzipkin / zipkin-support

repository for support questions raised as issues
4 stars 2 forks source link

Azure Kubernetes Service + MySQL 8 Exception #41

Closed welsh closed 3 years ago

welsh commented 3 years ago

Describe the Bug

When starting the application, the following error is presented:

MySQL host: redacted.mysql.database.azure.com

                  oo
                 oooo
                oooooo
               oooooooo
              oooooooooo
             oooooooooooo
           ooooooo  ooooooo
          oooooo     ooooooo
         oooooo       ooooooo
        oooooo   o  o   oooooo
       oooooo   oo  oo   oooooo
     ooooooo  oooo  oooo  ooooooo
    oooooo   ooooo  ooooo  ooooooo
   oooooo   oooooo  oooooo  ooooooo
  oooooooo      oo  oo      oooooooo
  ooooooooooooo oo  oo ooooooooooooo
      oooooooooooo  oooooooooooo
          oooooooo  oooooooo
              oooo  oooo

     ________ ____  _  _____ _   _
    |__  /_ _|  _ \| |/ /_ _| \ | |
      / / | || |_) | ' / | ||  \| |
     / /_ | ||  __/| . \ | || |\  |
    |____|___|_|   |_|\_\___|_| \_|

:: version 2.21.7 :: commit 7c8e569 ::

2020-10-21 21:51:12.347  INFO 1 --- [           main] o.s.s.c.ThreadPoolTaskExecutor           : Initializing ExecutorService
2020-10-21 21:51:12.350  INFO 1 --- [           main] o.s.s.c.ThreadPoolTaskExecutor           : Initializing ExecutorService 'mysqlExecutor'
2020-10-21 21:51:13.188  INFO 1 --- [oss-http-*:9411] c.l.a.s.Server                           : Serving HTTP at /0.0.0.0:9411 - http://127.0.0.1:9411/
2020-10-21 21:52:23.579  INFO 1 --- [orker-epoll-2-1] c.l.a.i.c.JavaVersionSpecific            : Using the APIs optimized for: Java 9+
2020-10-21 21:52:27.980  INFO 1 --- [cking-tasks-1-3] c.z.h.HikariDataSource                   : HikariPool-1 - Starting...
2020-10-21 21:52:28.458  WARN 1 --- [cking-tasks-1-3] z.s.i.BodyIsExceptionMessage             : Unexpected error handling request.

java.util.ServiceConfigurationError: org.mariadb.jdbc.authentication.AuthenticationPlugin: Provider org.mariadb.jdbc.internal.com.send.authentication.SendGssApiAuthPacket could not be instantiated
    at java.util.ServiceLoader.fail(Unknown Source) ~[?:?]
    at java.util.ServiceLoader$ProviderImpl.newInstance(Unknown Source) ~[?:?]
    at java.util.ServiceLoader$ProviderImpl.get(Unknown Source) ~[?:?]
    at java.util.ServiceLoader$3.next(Unknown Source) ~[?:?]
    at org.mariadb.jdbc.authentication.AuthenticationPluginLoader.get(AuthenticationPluginLoader.java:47) ~[mariadb-java-client-2.6.2.jar:?]
    at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.authenticationHandler(AbstractConnectProtocol.java:727) ~[mariadb-java-client-2.6.2.jar:?]
    at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.createConnection(AbstractConnectProtocol.java:546) ~[mariadb-java-client-2.6.2.jar:?]
    at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.connectWithoutProxy(AbstractConnectProtocol.java:1367) ~[mariadb-java-client-2.6.2.jar:?]
    at org.mariadb.jdbc.internal.util.Utils.retrieveProxy(Utils.java:635) ~[mariadb-java-client-2.6.2.jar:?]
    at org.mariadb.jdbc.MariaDbConnection.newConnection(MariaDbConnection.java:150) ~[mariadb-java-client-2.6.2.jar:?]
    at org.mariadb.jdbc.Driver.connect(Driver.java:89) ~[mariadb-java-client-2.6.2.jar:?]
    at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:138) ~[HikariCP-3.4.5.jar:?]
    at com.zaxxer.hikari.pool.PoolBase.newConnection(PoolBase.java:358) ~[HikariCP-3.4.5.jar:?]
    at com.zaxxer.hikari.pool.PoolBase.newPoolEntry(PoolBase.java:206) ~[HikariCP-3.4.5.jar:?]
    at com.zaxxer.hikari.pool.HikariPool.createPoolEntry(HikariPool.java:477) ~[HikariCP-3.4.5.jar:?]
    at com.zaxxer.hikari.pool.HikariPool.checkFailFast(HikariPool.java:560) ~[HikariCP-3.4.5.jar:?]
    at com.zaxxer.hikari.pool.HikariPool.<init>(HikariPool.java:115) ~[HikariCP-3.4.5.jar:?]
    at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:112) ~[HikariCP-3.4.5.jar:?]
    at zipkin2.storage.mysql.v1.HasTraceIdHigh.test(HasTraceIdHigh.java:40) ~[zipkin-storage-mysql-v1-2.21.7.jar:?]
    at zipkin2.storage.mysql.v1.Schema.<init>(Schema.java:50) ~[zipkin-storage-mysql-v1-2.21.7.jar:?]
    at zipkin2.storage.mysql.v1.MySQLStorage.schema(MySQLStorage.java:131) ~[zipkin-storage-mysql-v1-2.21.7.jar:?]
    at zipkin2.storage.mysql.v1.MySQLStorage.spanStore(MySQLStorage.java:139) ~[zipkin-storage-mysql-v1-2.21.7.jar:?]
    at zipkin2.storage.mysql.v1.MySQLStorage.serviceAndSpanNames(MySQLStorage.java:147) ~[zipkin-storage-mysql-v1-2.21.7.jar:?]
    at zipkin2.server.internal.ZipkinQueryApiV2.getServiceNames(ZipkinQueryApiV2.java:97) ~[classes/:?]
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) ~[?:?]
    at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) ~[?:?]
    at java.lang.reflect.Method.invoke(Unknown Source) ~[?:?]
    at com.linecorp.armeria.internal.server.annotation.AnnotatedService.invoke(AnnotatedService.java:303) ~[armeria-0.99.9.jar:?]
    at com.linecorp.armeria.internal.server.annotation.AnnotatedService.lambda$serve0$6(AnnotatedService.java:286) ~[armeria-0.99.9.jar:?]
    at java.util.concurrent.CompletableFuture$UniApply.tryFire(Unknown Source) ~[?:?]
    at java.util.concurrent.CompletableFuture$Completion.run(Unknown Source) ~[?:?]
    at com.linecorp.armeria.common.RequestContext.lambda$makeContextAware$3(RequestContext.java:521) ~[armeria-0.99.9.jar:?]
    at io.micrometer.core.instrument.internal.TimedRunnable.run(TimedRunnable.java:44) [micrometer-core-1.5.4.jar:1.5.4]
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) [?:?]
    at java.util.concurrent.FutureTask.run(Unknown Source) [?:?]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) [?:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.51.Final.jar:4.1.51.Final]
    at java.lang.Thread.run(Unknown Source) [?:?]
Caused by: java.lang.NoClassDefFoundError: org/ietf/jgss/GSSException
    at org.mariadb.jdbc.internal.com.send.authentication.SendGssApiAuthPacket.<clinit>(SendGssApiAuthPacket.java:77) ~[mariadb-java-client-2.6.2.jar:?]
    at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:?]
    at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) ~[?:?]
    at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) ~[?:?]
    at java.lang.reflect.Constructor.newInstanceWithCaller(Unknown Source) ~[?:?]
    at java.lang.reflect.Constructor.newInstance(Unknown Source) ~[?:?]
    ... 40 more
Caused by: java.lang.ClassNotFoundException: org.ietf.jgss.GSSException
    at jdk.internal.loader.BuiltinClassLoader.loadClass(Unknown Source) ~[?:?]
    at jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(Unknown Source) ~[?:?]
    at java.lang.ClassLoader.loadClass(Unknown Source) ~[?:?]
    at org.mariadb.jdbc.internal.com.send.authentication.SendGssApiAuthPacket.<clinit>(SendGssApiAuthPacket.java:77) ~[mariadb-java-client-2.6.2.jar:?]
    at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:?]
    at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) ~[?:?]
    at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) ~[?:?]
    at java.lang.reflect.Constructor.newInstanceWithCaller(Unknown Source) ~[?:?]
    at java.lang.reflect.Constructor.newInstance(Unknown Source) ~[?:?]
    ... 40 more

2020-10-21 21:52:28.583  INFO 1 --- [cking-tasks-1-9] c.z.h.HikariDataSource                   : HikariPool-1 - Starting...
2020-10-21 21:52:28.765  INFO 1 --- [cking-tasks-1-9] c.z.h.HikariDataSource                   : HikariPool-1 - Start completed.

However the application still appears to function normally and is able to receive / display traces, when connecting using a docker-compose to the same database on Mac 10.15.7 this error doesn't appear.

Steps to Reproduce

Deploy Zipkin onto Azure Kubernetes Service (1.18.8) with Azure Database for MySQL Server (Version: 8.0)

Expected Behaviour

Application should not error upon startup.

codefromthecrypt commented 3 years ago

Our mysql image is not intended for production use, so adding more modules to our image only to support use like this doesn't mix. It adds image size for non-typical and non-production use cases.

I would recommend using a different storage option (such as elasticsearch or cassandra), or make your own image of zipkin based on ours, but includes a full JRE or one jlinked with:

# GSS API auth is used by Azure Database for MySQL Server
java.security.jgss,\

Here's a dockerfile you can use to make your own image if you really need to setup a test mysql in azure.

ARG zipkin_version=master

# import Zipkin's official layer
FROM openzipkin/zipkin:$zipkin_version as officialZipkin

# use a larger JRE
FROM azul/zulu-openjdk-alpine:15-jre as zipkin

# Add HEALTHCHECK and ENTRYPOINT scripts into the default search path
COPY --from=officialZipkin /usr/local/bin/docker-healthcheck /usr/local/bin/
HEALTHCHECK --interval=5s --start-period=30s --timeout=5s CMD ["docker-healthcheck"]

COPY --from=officialZipkin /usr/local/bin/start-zipkin /usr/local/bin/
ENTRYPOINT ["start-zipkin"]

# All content including binaries and logs write under WORKDIR
ARG USER=zipkin
WORKDIR /${USER}

# Ensure the process doesn't run as root
RUN adduser -g '' -h ${PWD} -D ${USER}

# Switch to the runtime user
USER ${USER}
# Copy official binaries onto this layer
COPY --from=officialZipkin --chown=${USER} /zipkin/ .

# 3rd party modules like zipkin-aws will apply profile settings with this
ENV MODULE_OPTS=

# Zipkin's full distribution includes Scribe support (albeit disabled)
EXPOSE 9410 9411
codefromthecrypt commented 3 years ago

I added a working dockerfile, but this option is unsupported. If we end up with prod configurations or more need for GSS, we can reconsider.

welsh commented 3 years ago

@adriancole thank you for a fast reply, going with Cassandra was my first choice but I'm trying to deploy using Azure managed services.

Unfortunately when deploying Azure Cosmos DB with the Cassandra API selected it deploys version 3.11.0 and Zipkin upon connecting to it indicates version 3.11.3+ is required.

This made me default back to MySQL since there isn't a Azure managed Elasticsearch (by Microsoft) offering at this time so I will give building my own image tomorrow with the dockerfile provided.

Though depending on performance I may look to just deploy my own Elastic into the AKS instance to run my own mini-cluster.

codefromthecrypt commented 3 years ago

@welsh this is a great reply, full of rich information. I'm in debt!

you might look at this as it is also about UDT support. It is possible we can relax the version constraint when search is disabled.. https://github.com/openzipkin/zipkin-aws/issues/180 However, search disabled isn't quite grand either.

Do you know anyone at Cosmos? Maybe someone can update that version?

codefromthecrypt commented 3 years ago

one way to tell if cosmos is really compatible is to connect manually and try to install the schemas. if they work, perhaps we can make a special case. There was an indexing function added in 3.11.3 which is needed for efficient query.

https://github.com/openzipkin/zipkin/blob/master/zipkin-storage/cassandra/src/main/resources/zipkin2-schema.cql https://github.com/openzipkin/zipkin/blob/master/zipkin-storage/cassandra/src/main/resources/zipkin2-schema-indexes.cql

welsh commented 3 years ago

@adriancole I attempted to run those manually against Cosmos DB however ran into two issues.

First any CREATE TABLE that has AND gc_grace_seconds = 3600 had to be set to 0 (https://docs.microsoft.com/en-us/azure/cosmos-db/cassandra-support#keyspace-and-table-options) which I'm not sure on the impact being unfamiliar with Cosmos & Cassandra.

Second the following indexes could not be created:

sandbox-cosmos-zipkin@cqlsh> CREATE CUSTOM INDEX IF NOT EXISTS ON zipkin2.span (l_service) USING 'org.apache.cassandra.index.sasi.SASIIndex'
   ...    WITH OPTIONS = {'mode': 'PREFIX'};
ConfigurationException: When using custom indexes, must have a class name and set to a supported class. Supported class names are ['CosmosDefaultIndex','AzureSearchIndex','CosmosClusteringIndex','CompositeClusterIndexName'] Got 'org.apache.cassandra.index.sasi.SASIIndex'

sandbox-cosmos-zipkin@cqlsh> CREATE CUSTOM INDEX IF NOT EXISTS ON zipkin2.span (annotation_query) USING 'org.apache.cassandra.index.sasi.SASIIndex'
   ...    WITH OPTIONS = {
   ...     'mode': 'PREFIX',
   ...     'analyzed': 'true',
   ...     'analyzer_class':'org.apache.cassandra.index.sasi.analyzer.DelimiterAnalyzer',
   ...     'delimiter': '░'};
ConfigurationException: When using custom indexes, must have a class name and set to a supported class. Supported class names are ['CosmosDefaultIndex','AzureSearchIndex','CosmosClusteringIndex','CompositeClusterIndexName'] Got 'org.apache.cassandra.index.sasi.SASIIndex'

sandbox-cosmos-zipkin@cqlsh> CREATE CUSTOM INDEX IF NOT EXISTS ON zipkin2.trace_by_service_span (duration) USING 'org.apache.cassandra.index.sasi.SASIIndex'
   ...    WITH OPTIONS = {'mode': 'PREFIX'};
ConfigurationException: When using custom indexes, must have a class name and set to a supported class. Supported class names are ['CosmosDefaultIndex','AzureSearchIndex','CosmosClusteringIndex','CompositeClusterIndexName'] Got 'org.apache.cassandra.index.sasi.SASIIndex'

It appears that the index type org.apache.cassandra.index.sasi.SASIIndex is not supported, however I did try all the other Index Types (CosmosDefaultIndex, AzureSearchIndex, CosmosClusteringIndex & CompositeClusterIndexName) and received:

InvalidRequest: Error from server: code=2200 [Invalid query] message="The CUSTOM index CosmosDefaultIndex does not support specifying a target column"

For all of them, that said from reading Indexing in Azure Cosmos DB - Overview it states:

By default, Azure Cosmos DB automatically indexes every property for all items in your container without having to define any schema or configure secondary indexes.

So not being sure if those Indexes are required I just carried on and checked the source code out and modified zipkin2.storage.cassandra.Schema to let it start up using 3.11.0.

After a bit of experimenting I got it running and connecting properly with these environment variables:

STORAGE_TYPE=cassandra3
CASSANDRA_ENSURE_SCHEMA=false
CASSANDRA_USERNAME=sandbox-cosmos-zipkin
CASSANDRA_PASSWORD=REDACTED
CASSANDRA_CONTACT_POINTS=sandbox-cosmos-zipkin.cassandra.cosmos.azure.com:10350
CASSANDRA_USE_SSL=true

And this JVM Parameter:

-Dzipkin.storage.cassandra3.localDc="Canada Central"

But it seems that Cosmos is not supportable in its current state as a search immediately resulted in:

com.datastax.oss.driver.api.core.AllNodesFailedException: All 1 node(s) tried for the query failed (showing first 1 nodes, use getAllErrors() for more): Node(endPoint=sandbox-cosmos-zipkin.cassandra.cosmos.azure.com/52.237.20.252:10350, hostId=ca1dfc69-f6d7-4064-a23b-08548f19c884, hashCode=4e4a52c7): [com.datastax.oss.driver.api.core.servererrors.ServerError: Distinct is not implemented yet]
    at com.datastax.oss.driver.api.core.AllNodesFailedException.copy(AllNodesFailedException.java:141) ~[java-driver-core-4.9.0.jar:?]
    at com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149) ~[java-driver-core-4.9.0.jar:?]
    at com.datastax.oss.driver.internal.core.cql.CqlPrepareSyncProcessor.process(CqlPrepareSyncProcessor.java:59) ~[java-driver-core-4.9.0.jar:?]
    at com.datastax.oss.driver.internal.core.cql.CqlPrepareSyncProcessor.process(CqlPrepareSyncProcessor.java:31) ~[java-driver-core-4.9.0.jar:?]
    at com.datastax.oss.driver.internal.core.session.DefaultSession.execute(DefaultSession.java:230) ~[java-driver-core-4.9.0.jar:?]
    at com.datastax.oss.driver.api.core.cql.SyncCqlSession.prepare(SyncCqlSession.java:224) ~[java-driver-core-4.9.0.jar:?]
    at zipkin2.storage.cassandra.SelectServiceNames$Factory.<init>(SelectServiceNames.java:34) ~[classes/:?]
    at zipkin2.storage.cassandra.CassandraSpanStore.<init>(CassandraSpanStore.java:86) ~[classes/:?]
    at zipkin2.storage.cassandra.CassandraStorage.spanStore(CassandraStorage.java:164) ~[classes/:?]
    at zipkin2.server.internal.ZipkinQueryApiV2.getTraces(ZipkinQueryApiV2.java:147) ~[classes/:?]
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
    at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
    at java.lang.reflect.Method.invoke(Method.java:564) ~[?:?]
    at com.linecorp.armeria.internal.server.annotation.AnnotatedService.invoke(AnnotatedService.java:323) ~[armeria-1.2.0.jar:?]
    at com.linecorp.armeria.internal.server.annotation.AnnotatedService.lambda$serve0$6(AnnotatedService.java:299) ~[armeria-1.2.0.jar:?]
    at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642) ~[?:?]
    at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478) ~[?:?]
    at com.linecorp.armeria.common.RequestContext.lambda$makeContextAware$3(RequestContext.java:502) ~[armeria-1.2.0.jar:?]
    at io.micrometer.core.instrument.internal.TimedRunnable.run(TimedRunnable.java:44) [micrometer-core-1.5.5.jar:1.5.5]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
    at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264) [?:?]
    at java.util.concurrent.FutureTask.run(FutureTask.java) [?:?]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.53.Final.jar:4.1.53.Final]
    at java.lang.Thread.run(Thread.java:832) [?:?]
    Suppressed: com.datastax.oss.driver.api.core.servererrors.ServerError: Distinct is not implemented yet

Which upon checking matches with the CQL commands listing that DISTINCT is not supported so that rules out Cosmos DB for now 😞

Switching back to the dockerfile you provided, I can confirm that it no longer generates a java.lang.ClassNotFoundException for org.ietf.jgss.GSSException so thank you!

I'll most likely use MySQL in the short term and longer-term look at deploying a simple elastic cluster within the AKS as we scale up the number of services reporting traces.

codefromthecrypt commented 3 years ago

Great info. thanks again and good luck!