jaegertracing / helm-charts

Helm Charts for Jaeger backend
Apache License 2.0
266 stars 342 forks source link

[jaeger] Not able to connect cassandra db hosted on Azure #137

Open paulpuvi06 opened 4 years ago

paulpuvi06 commented 4 years ago

Hello Team ,

I have used 'jaegertracing/jaeger' chart to install jaeger and trying to use azure cosmos db for storage but it fails to connect and it defaults to port 9042 instead given CASSANDRA_PORT 10350.

Schema job log:

Connection error: ('Unable to connect to any servers', {'xxx.xx.xx.xx': error(None, "Tried connecting to [('xxx.xx.xx.xxx', 9042)]. Last error: timed out")}) Cassandra is still not up at testxx.cassandra.cosmos.azure.com. Waiting 1 second.

jaeger-cassandra-schema-job env spec

containers:
  - env:
    - name: CASSANDRA_SERVERS
      value: testxxx.cassandra.cosmos.azure.com
    - name: CASSANDRA_PORT
      value: "10350"
    - name: CASSANDRA_KEYSPACE
      value: jaeger_v1_test2
    - name: CASSANDRA_USERNAME
      value: testxx
    - name: CASSANDRA_PASSWORD
      valueFrom:
        secretKeyRef:
          key: password
          name: jaegar-jaeger-cassandra
    - name: CQLSH_HOST
      value: testxx.cassandra.cosmos.azure.com
    - name: DATACENTER
      value: dc1
    - name: KEYSPACE
      value: jaeger_v1_test2
    image: jaegertracing/jaeger-cassandra-schema:1.18.0

do i need to make any other changes in value to get this connected with azure cassandra db?

TIA

Regards, Paul

naseemkullah commented 4 years ago

Hi @paulpuvi06 https://github.com/jaegertracing/jaeger/issues/2366 has been created on your behalf please check for a response from @jaegertracing/jaeger-maintainers

naseemkullah commented 4 years ago

HI @paulpuvi06 as per @pavolloffay's reponse in jaegertracing/jaeger#2366, the port should be appended to CQLSH_HOST.

Please let me know if that works for you.

paulpuvi06 commented 4 years ago

@naseemkullah Sure, I have appended port to CQLSH_HOST via extraEnv but it has added duplicated CQLSH_HOST env variable entry and error remains the same.

Connection error: ('Unable to connect to any servers', {'xxx.xx.xx.xx': error(None, "Tried connecting to [('xxx.xx.xx.xxx', 9042)]. Last error: timed out")}) Cassandra is still not up at testxx.cassandra.cosmos.azure.com. Waiting 1 second.

spec:
  containers:
  - env:
    - name: CASSANDRA_SERVERS
      value: testxxx.cassandra.cosmos.azure.com
    - name: CASSANDRA_PORT
      value: "10350"
    - name: CASSANDRA_KEYSPACE
      value: jaeger_v1_test2
    - name: CASSANDRA_USERNAME
      value: testxx
    - name: CASSANDRA_PASSWORD
      valueFrom:
        secretKeyRef:
          key: password
          name: jaegar-jaeger-cassandra
    **- name: CQLSH_HOST
      value: tracing.cassandra.cosmos.azure.com:10350
    - name: CQLSH_HOST
      value: tracing.cassandra.cosmos.azure.com**
    - name: DATACENTER
      value: dc1
    - name: KEYSPACE
      value: jaeger_v1_test2
    image: jaegertracing/jaeger-cassandra-schema:1.18.0

TIA.

naseemkullah commented 4 years ago
- name: CQLSH_HOST
  value: tracing.cassandra.cosmos.azure.com**

It appears that the second CQLSH_HOST is overriding the first, please only have it set once.

naseemkullah commented 4 years ago

Hi @paulpuvi06 I see the limitation that is causing your issue, could you please open a PR to address it?

https://github.com/jaegertracing/helm-charts/blob/32d7fb792a66b7977be649e52d607bda5e8d25c2/charts/jaeger/templates/cassandra-schema-job.yaml#L47

Should also include port, in particular if it is not the default one.

Cheers!

paulpuvi06 commented 4 years ago

Hello @naseemkullah ,

Sure and thank you , I have made couple of changes in jaeger schema manifest to make it connect . But getting below error now while generating schema.

Generating the schema for the keyspace jaeger_v1_test1 and datacenter weu
Using template file /cassandra-schema/v003.cql.tmpl with parameters:
    mode = prod
    datacenter = weu
    keyspace = jaeger_v1_test1
    replication = {'class': 'NetworkTopologyStrategy', 'weu': '2' }
    trace_ttl = 172800
    dependencies_ttl = 0
<stdin>:13:SyntaxException: line 1:42 no viable alternative at input 'keyvalue (...IF NOT EXISTS jaeger_v1_test1.keyvalue...)
<stdin>:18:SyntaxException: line 3:120 no viable alternative at input 'keyvalue (...og (
    ts      bigint, // microseconds since epoch
    fields  list<frozen<keyvalue...)
<stdin>:24:SyntaxException: line 5:133 no viable alternative at input ') (...        text,
    trace_id        blob,
    span_id         bigint,
)...)
<stdin>:29:SyntaxException: line 3:110 no viable alternative at input 'keyvalue (... (
    service_name    text,
    tags            list<frozen<keyvalue...)
<stdin>:54:SyntaxException: line 10:345 no viable alternative at input 'keyvalue (...   operation_name  text,
    flags           int,
    start_time      bigint, // microseconds since epoch
    duration        bigint, // microseconds
    tags            list<frozen<keyvalue...)
<stdin>:68:InvalidRequest: Error from server: code=2200 [Invalid query] message="gc_grace_seconds value must be zero."
<stdin>:84:InvalidRequest: Error from server: code=2200 [Invalid query] message="gc_grace_seconds value must be zero."
<stdin>:101:InvalidRequest: Error from server: code=2200 [Invalid query] message="gc_grace_seconds value must be zero."
<stdin>:118:InvalidRequest: Error from server: code=2200 [Invalid query] message="gc_grace_seconds value must be zero."
<stdin>:137:InvalidRequest: Error from server: code=2200 [Invalid query] message="gc_grace_seconds value must be zero."
<stdin>:157:InvalidRequest: Error from server: code=2200 [Invalid query] message="gc_grace_seconds value must be zero."
<stdin>:164:SyntaxException: line 6:161 no viable alternative at input ') (...child           text,
    call_count      bigint,
    source          text,
)...)
<stdin>:177:InvalidRequest: Error from server: code=2200 [Invalid query] message="Unknown type jaeger_v1_test1.dependency"
jeff1985 commented 4 years ago

Hi, for me it also doesnt work. I tried to get it to run, but not able to connect:

Validation is enabled; SSL transport factory requires a valid certfile to be specified. Please provide path to the certfile in [ssl] section as 'certfile' option in /root/.cassandra/cqlshrc (or use [certfiles] section) or set SSL_CERTFILE environment variable.

As far as I can see from the template, connecting to cassandra via tls with password only (no client cert) is not supported?

paulpuvi06 commented 4 years ago

@naseemkullah Jaegar-schema job is completed and created keyspace & tables in cassandra db(Azure Cosmos). Now collector and query pod is keep on restarting. Should we need update any values ?

kubectl logs jaegar-jaeger-collector-7554c56646-wlkvc

2020/08/12 11:58:53 maxprocs: Leaving GOMAXPROCS=4: CPU quota undefined
{"level":"info","ts":1597233533.9665623,"caller":"flags/service.go:116","msg":"Mounting metrics handler on admin server","route":"/metrics"}
{"level":"info","ts":1597233533.9668648,"caller":"flags/admin.go:120","msg":"Mounting health check on admin server","route":"/"}
{"level":"info","ts":1597233533.96695,"caller":"flags/admin.go:126","msg":"Starting admin HTTP server","http-addr":":14269"}
**{"level":"info","ts":1597233533.9669793,"caller":"flags/admin.go:112","msg":"Admin server started","http.host-port":"[::]:14269","health-status":"unavailable"}**

kubectl logs jaegar-jaeger-query-694cddc75-nzt52 jaegar-jaeger-query


2020/08/12 11:59:40 maxprocs: Leaving GOMAXPROCS=4: CPU quota undefined
{"level":"info","ts":1597233580.0927389,"caller":"flags/service.go:116","msg":"Mounting metrics handler on admin server","route":"/metrics"}
{"level":"info","ts":1597233580.0932186,"caller":"flags/admin.go:120","msg":"Mounting health check on admin server","route":"/"}
{"level":"info","ts":1597233580.0933068,"caller":"flags/admin.go:126","msg":"Starting admin HTTP server","http-addr":":16687"}
**{"level":"info","ts":1597233580.0933387,"caller":"flags/admin.go:112","msg":"Admin server started","http.host-port":"[::]:16687","health-status":"unavailable"}**
kyschouv commented 4 years ago

I've been working on a PR for this, but I'm still unable to get it to connect.

https://github.com/kyschouv/helm-charts

error:

socket.gaierror: [Errno -2] Name or service not known
Cassandra is still not up at sbtest-jaeger-cosmos.cassandra.cosmos.azure.com:10350. Waiting 1 second.
Traceback (most recent call last):
  File "/opt/cassandra/bin/cqlsh.py", line 2459, in <module>
    main(*read_options(sys.argv[1:], os.environ))
  File "/opt/cassandra/bin/cqlsh.py", line 2437, in main
    encoding=options.encoding)
  File "/opt/cassandra/bin/cqlsh.py", line 485, in __init__
    load_balancing_policy=WhiteListRoundRobinPolicy([self.hostname]),
  File "/opt/cassandra/bin/../lib/cassandra-driver-internal-only-3.11.0-bb96859b.zip/cassandra-driver-3.11.0-bb96859b/cassandra/policies.py", line 417, in __init

values:

provisionDataStore:
  cassandra: false

storage:
  type: cassandra
  cassandra:
    host: sbtest-jaeger-cosmos.cassandra.cosmos.azure.com
    port: 10350
    user: sbtest-jaeger-cosmos
    existingSecret: sbtest-jaeger-cosmos
    existingSecretKey: primaryMasterKey
    tls:
      enabled: false
    cmdlineParams:
      cassandra.disable-compression:
      cassandra.tls.enabled:
      cassandra.tls.skip-host-verify:
      cassandra-archive.disable-compression:
      cassandra-archive.tls.enabled:
      cassandra-archive.tls.skip-host-verify:
      cassandra.connections-per-host: 1

Any ideas?

yurishkuro commented 4 years ago

socket.gaierror: [Errno -2] Name or service not known sounds like the host name of Cassandra is not resolvable from the service container / network namespace.

kyschouv commented 4 years ago

It's resolving to an ip at least =/.

sshah90 commented 3 years ago

@paulpuvi06 are you able to resolve an issue with the collector and query pods?

I made some changes in the schema file and able to create tables and types but now collector and query pods are in CrashLoopBackOff with same logs.